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DECLARATION OF VISHWANATH R. IYER, Ph.D. 
UNDER 37 C.F.R. § 1.132 

I, VISHWANATH R. IYER, Ph.D., declare and state as 

follows : 

1. I am an Assistant Professor in the Section of 
Molecular Genetics and Microbiology, Institute of Cellular and 
Molecular Biology, University of Texas at Austin, where my 
laboratory currently studies global transcriptional control in 
yeast, gene expression programs during human cell 
proliferation, and genome-wide transcription factor targets in 
yeast and human. Immediately prior to this position, I spent 
four years as a postdoctoral fellow in the laboratory of 
Patrick 0. Brown at Stanford University studying the 
transcriptional programs of yeast and of human cells . My 
curriculum vitae is attached hereto as Exhibit A. 

2. Beginning in Dr. Brown's laboratory, where I 
helped to develop the first whole genome arrays for yeast and 
early versions of highly representative cDNA arrays for human 
cells, and continuing to the present day, I have used 
microarray-based gene expression analysis as a principal 
approach in much of my research. 

3. Representative publications describing this 
work include: 



DeRisi J. et al., "Exploring the . metabolic and 
"'genetic control of gene expression "on "a genomic 
scale," Science 278:680-686 (1997 J; 1 

Marton et al., "Drug target validation and 
identification of secondary drug target effects 
using DNA microarrays, " Nature Med. 4:1293-1301 
(1998) ; 2 

Iyer et al . , "The transcriptional program in 
the response of human fibroblasts to serum, " 
Science 283:83-87 (1999) ; 3 and 

Ross et al . , "Systematic variation in gene 
expression patterns in human cancer cell lines, " 
Nature Genetics 24: 227-235 (2000) . 4 

Two of the papers describe our use of microarray-based 
expression profiling to explore the metabolic reprogramming 
that occurs during major physiological changes, both in yeast 
(DeRisi et al . , during the shift from fermentation to 
respiration) and in human cells (Iyer et al . , human 
fibroblasts exposed to serum) . One reference describes our 
use of expression profile analysis in drug target validation 
and identification of secondary drug effects (Marton et al.). 
And one describes our use of expression profiling as a 
molecular phenotyping tool to discriminate among human cancer 
cells (Ross et al.). 

4. Whether used to elucidate basic physiological 
responses, to study primary and secondary drug effects, or to 
discriminate and classify human cancers, expression profiling 
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as we have practiced it ; relies . f or. its power on comparison of 
patterns of expression. 



5. For example, we have demonstrated that we can 
use the presence or absence of a characteristic drug 
"signature" pattern of altered gene expression in drug- treated 
cells to explore the mechanism of drug action, and to identify 
secondary effects that can signal potentially deleterious drug 
side effects. As another example, we have demonstrated that 
gene expression patterns can be used to classify human tumor 
cell lines. While it is of course advantageous to know the 
biological function of the encoded gene products in order to 
reach a better understanding of the cellular mechanisms 
underlying these results, these pattern-based analyses do not 
require knowledge of the biological function of the encoded 
proteins. 

6. The resolution of the patterns used in such 
comparisons is determined by the number of genes detected: the 
greater the number of genes detected, the higher the 
resolution of the pattern. It goes without saying that higher 
resolution patterns are generally more useful in such 
comparisons than lower resolution patterns. With such higher 
resolutions comes a correspondingly higher degree of 
statistical confidence for distinguishing different patterns, 
as well as identifying similar ones. 

7 . Each gene included as a probe on a microarray 
provides a signal that is specific to the cognate transcript, 
at least to a first approximation. 5 Each new gene-specific 



5 In a more nuanced view, it is certainly possible for a probe to 

signal the presence of a variety of splice variants of a single gene, 

(Continued... 



probe added to. a • mi c roar ray thus increases 'the number of genes 
detectable by the device, increasing the resolving power of 
the device. As I note above, higher resolution patterns are 
generally more useful in comparisons than lower resolution 
patterns. Accordingly, each new gene probe added to a 
microarray increases the usefulness of the device in gene 
expression profiling analyses. This proposition is so well- 
established as to be virtually an axiom in the art, and has 
been as long as I have been working in the field, and 
certainly since the time I embarked on the production of whole 
genome arrays in early 1996. Simply put, arrays with fewer 
gene-specific probes are inferior to arrays with more gene- 
specific probes. 

8. For example, our ability to subdivide cancers 
into discriminable classes by expression profiling is limited 
by the resolution of the patterns produced. With more genes 
contributing to the expression patterns, we can potentially 
draw finer distinctions among the patterns, thus subdividing 
otherwise indistinguishable cancers into a greater number of 
classes; the greater the number of. classes, the greater the 
likelihood that the cancers classified together will respond 
similarly to therapeutic intervention, permitting better 
individualization of therapy and, we hope, better treatment 
outcomes . 

9. If a gene does not change expression in an 
experiment, or if a gene is not expressed and produces no 



(...Continued) 

without discriminating among them, and for a probe to signal the presence 
of a variety of allelic variants of a single gene, again without 
discriminating among them. 
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signal in an experiment, that is not to say that the probe 
lacks usefulness on the array; it only means that an 
insufficient number of conditions have been sampled to 
identify expression changes. In fact, an experiment showing 
that a gene is not expressed or that its expression level does 
not change can be equally informative. To provide maximum 
versatility as a research tool, the microarray should 
include — and as a biologist I would want my microarray to 
include — each newly identified gene as a probe. 



herein of my own knowledge are true and that all statements 
made on information and belief are believed to be true, and 
further that these statements were made with the knowledge 
that willful false statements and the like so made are 
punishable by fine or imprisonment, or both, under 
Section 1001 of Title 18 of the United States Code and may 
jeopardize the validity of any patent application in which 
this declaration is filed or any patent that issues thereon 
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Vishwahath R. Iyer 

Assistant Professor 

Section of Molecular Genetics and Microbiology 

Institute of Cellular and Molecular Biology 

MBB 3.212A, University of Texas at Austin 

Austin, TX 78712-0159 

Phone: 512-232-7833 

Fax: 512-232-3432 

Email: vishy@mail.utexas.edu 

Education/Training 

Bombay University, Mumbai, India B.Sc. (1987), Chemistry & Biochemistry 

M. S. University of Baroda, Baroda, India M.Sc. (1989), Biotechnology 

Harvard University, Cambridge MA Ph.D. (1996! Genetics 

Stanford University, Stanford CA Post-doctoral (1996-2000), Genomics 

Research Experience 

9/00-5/ 03 Assistant professor, Section of Molecular Genetics and 
Microbiology, University of Texas, Austin TX 

■ Global transcriptional control in yeast 

■ Gene expression programs during human cell proliferation 

■ Genome-wide transcription factor targets in yeast and human 

■ Collaborative microarray facility 

5/96-8/00 Post-doctoral fellow Stanford University, Stanford CA 
(Advisor: Dr. Patrick O. Brown) 

■ Yeast whole-genome ORF and intergenic microarrays 

■ Human cDNA microarrays for expression profiling 

9/89-4/96 Graduate student Harvard University, Cambridge MA 

(Advisor: Dr. Kevin Struhl) 

■ Yeast transcriptional regulation 



Honours and Awards 

Government of India Biotechnology Fellowship (1987-1989) 
University Grants Commission Junior Research Fellowship (1989) 
Stanford University/NHGRI Genome Training Grant (1996) 

Invited Conference talks (selected) 

Invited Lecturer, NEC-Princeton Lectures in Biophysics 

Princeton, NJ (June 1998) 
Plenary Session Speaker, HGM '99 (HUGO Human Genome Meeting) 

Brisbane, Australia (April 1999) 
Invited Speaker, Gordon Research Conference "Human Molecular Genetics" 

Newport, RI (August 2001) 



Invited Speaker; Nature Genetics "Oh 

Dublin, Ireland (May 2002) 
Invited Speaker, "Pathology Bioinformatics" Symposium, University of Michigan, 

Ann Arbor, MI (November 2002) 
Invited Speaker, "Systems Biology: Genomic Approaches to Transcriptional 

Regulation" Cold Spring Harbor Laboratory Meeting (March 2003) 
Symposium co-Chair and Speaker "Functional Genomics" American Society for 

Biochemistry and Molecular Biology Meeting, San Diego, CA (April 2003) 
Invited Speaker in Functional Genomics (Gene Networks) Symposium, International 

Congress of Genetics, Melbourne Australia July 6-11 2003 
Invited Speaker "BioArrays Europe 2003" 

Cambridge, UK (Sep/Oct 2003) 

Departmental Seminars 

Texas A&M University Genetics and Biochemistry & Biophysics Departments, 
October 24 2002 

New York University School of Medicine, Department of Biochemistry, 

November 20 2002 
UT Southwestern Medical Center, Human Genetics Seminar Series, 

May 5 2002 

UCLA School of Medicine, Department of Human Genetics 

June 2 2003 
National Human Genome Research Institute 

June 12 2003 
Sanger Institute of the Wellcome Trust, Hinxton, UK 

Sep 2003 

Other Professional Activities 

Reviewer for Genome Biology, Genome Research, Nature Genetics, Science (1998- 
2003) 

Instructor, Cold Spring Harbor Summer Course "Making and using DNA Microarrays" 
(2000 - 2003) 

Member, NIDDK Special Emphasis Review Panel ZDKi (2001-2002) 



Publications 

1. Iyer V . & Struhl, K. (1995) Poly(dA:dT), a ubiquitous promoter element that 

stimulates transcription via its intrinsic DNA structure, EMBO J. 14: 2570-2579. 

2. Iyer V. & Struhl, K. (1995) Mechanism of differential utilization of the his3 TR and TC 

TATA elements, Mol Cell Biol. 15: 7059-7066. 

3. Iyer V. & Struhl K. (1996) Absolute mRNA levels and transcription initiation rates in 

Saccharomyces cerevisiae. Proc. Natl. Acad. Sci . (USA) 93:5208-5212. 



4. DeRisi J. L., Iver V. R. & Brown P. 6. (1997) Exploring the metabolic and genetic 

control of gene expression on a genomic scale. Science 278:680-686 

5. Marton M. J., DeRisi J. L., Bennett H. A., IverV. R.. Meyer M. R., Roberts C. J., 
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drug target effects using DNA microarrays. Nature Med. 4:1293-1301 

6. Lutfiyya L. L-, IverV. R. . DeRisi J., DeVit M. J., Brown P. O. & Johnston M. (1998) 

Characterization of three related glucose repressors and genes they regulate in 
Saccharomyces cerevisiae. Genetics 150:1377-1391 

7. Spellman P. T., Sherlock G., Zhang M. Q., IverV. R. . Anders K., Eisen M. B., Brown P. 

O., Botstein D. & Futcher B. (1998) Comprehensive identification of cell cycle- 
regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. 
Mol. Biol. Cell 9:3273-3297 

8. IverV. R. . Eisen M. B., Ross D. T., Schuler G., Moore T., Lee J. C.,F., Trent J. M., 

Staudt L. M., Hudson Jr. J., Boguski M. S., Lashkari D. 3 Shalon D., Botstein D. & 
Brown P. O. (i999) The transcriptional program in the response of human 
fibroblasts to serum. Science 283:83-87 

9. DeRisi J. L. & Iyer V. R. (1999) Genomics and array technology. Curr. Opin. Oncol. 

11:76-79 

10. Ross D. T., Scherf U., Eisen M. B., Perou C. M., Spellman P., IverV. R. . Rees C, 
Jeffrey S. S., Van de Rijn M., Waltham M., Pergamenschikov A., Lee J. C. F., 
Lashkari D., Shalon D., Myers T. G., Weinstein J. N., Botstein D., & Brown P. O. 
(2000) Systematic variation in gene expression patterns in human cancer cell lines. 
Nature Genetics 24: 227-235 

11. Sudarsanam P., Iver V. R.. Brown P. O. & Winston F. (2000) Whole-genome 
expression analysis of snf/swi mutants of S. cerevisiae. Proc. Natl. Acad. Sci .(USA) 
97: 3364-3369 

12. Tran H. G., Steger D. J., IverV. R. . & Johnson A. D. (2000) The chromo domain 
protein Chdip from budding yeast is an ATP-dependent chromatin-modifying factor 
EMBO J 19: 2323-2331 

13. Gross C, Kelleher M., IverV. R„ Brown P. O., & Winge D. R.. (2000) Identification 
of the copper regulon in Saccharomyces cerevisiae by DNA microarrays. J. Biol. 
Chem. 275: 32310-32316 

14. Reid J. L., Iyer V. R. . Brown P. O. & Struhl K. (2000) Coordinate regulation of yeast 
ribosomal protein genes is associated with targeted recruitment of Esai histone 
acetylase. Mol. Cell 6: 1297-1307 



1 1; . Tver V. R . , H orak C. , Scafe G/S.i Bbtsteiri D., Snyder M. & BrovvTi P. O. (2001) 
Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF 
Nature 409: 533-538 

16. Miki R., Kadota K., Bono H., Mizuno Y., Tomaru Y., Carninci P., Itoh M., Shibata K., 
Kawai J., Konno H., Watanabe S., Sato K., Tokusumi Y., Kikuchi N., Ishii Y., 
Hamaguchi Y., Nishizuka I., Goto H., Nitanda H., Satomi S., Yoshiki A., Rusakabe 
M., DeRisi J.L., Eisen M.B., Iyer V.R. . Brown P.O., Muramatsu M., Shimada H., 
Okazaki Y. & Hayashizaki Y. (2001) Delineating developmental and metabolic 
pathways in vivo by expression profiling using the RIKEN set of 18,816 full-length 
enriched mouse cDNA arrays Proc. Natl. Acad. Sci. (USA) 98: 2199-2204 

17. Pollack J. R. & Iver V.R. (2002) Characterizing the physical genome. Nature 
Genetics 32 suppl: 515-521 

18. Iyer V. R. Microarray-based detection of DNA protein interactions: Chromatin 
Immunoprecipitation on Microarrays, in DNA Microarrays: A Molecular Cloning 
Manual (eds. Bowtell, D. & Sambrook, J.) 453-463 (Cold Spring Harbor Laboratory 
Press, 2003). 

* {not peer reviewed) 

19. Killion, P., Sherlock G. and IyerV. R. (2003) The Longhorn Array Database, an 
open-source implementation of the Stanford Microarray Database BMC 
Bioinformatics 4: 32 

20. Hahn J. S., Hu Z., Thiele D. J. & Iver V. R. Genome- Wide Analysis of the Biology of 
Stress Responses Through Heat Shock Transcription Factor (submitted to PNAS) 

21. Kim J. & Tver V.R. The global role of TBP recruitment to promoters in mediating 
gene expression profiles (manuscript in preparation) 



Current/Pending Research Support 

U01 AA13518-01 Adron Harris (PI) 25% effort 

9/28/01 - 9/27/06 

NIH/NIAAA 

"INIA: Microarray Core" 

This proposal was a response to the Integrative Neuroscience Initiative on Alcoholism 

(INIA) RFA-AA-Oi-002. The overall goal is to support the use of microarray technology 

to define changes in gene expression that either predict or accompany excessive alcohol 

consumption. 

Role: Co-investigator 



003658-0223-2001 Iyer (PI) 16% effort • 
01/01/02 - 08/31/04 

Texas Higher Education Coordinating Board (ARP) 

"Microarray based global mapping of DNA-protein interactions at promoters in human 
cells" 

This is a pilot project to map the in vivo interactions of transcription factors with human 

promoters 

Role: PI 



Information Technology Research 0325116 R. Mooney (PI) 9% effort 

09/01/03 - 08/31/07 

NSF 

"Feedback from Multi-Source Data Mining to Experimentation for Gene Network 

Discovery" 

Role: Co-investigator 



1 R01 CA95548-01A2 (pending) Iyer (PI) 25% effort 

12/1/03 - 11/30/08 

NIH 

"Analysis of genome-wide transcriptional control in yeast" 

This is a project to identify stress responsive transcription factor targets in yeast through 
the use of DNA microarrays 
Role: PI 



Breast Cancer Idea Award (pending) Iyer (PI) 10% effort 
1/1/04 - 12/31/06 

US Army Medical Research and Materiel Command 

"Genome-wide chromosomal targets of oncogenic transcription factors" 

This is a project aimed at identifying direct chromosomal targets of c-myc and ERin 

human cells through the use of a novel sequence tag analysis method. 

Role: PI 

003658-0531-2003 (pending) Marcotte (PI) 8% effort 
01/01/04 - 12/31/05 

Texas Higher Education Coordinating Board (ATP) 

"Cell arrays: A novel high-throughput platform for measuring gene function on a 
genomic scale" 

This proposal is aimed at developing a novel microarray based platform for automated, 
high-throughput microscopic imaging of cells, allowing rapid and systematic evaluation 
of gene function. 
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Exploring the Metabolic and Genetic Control of 
Gene Expression on a Genomic Scale 

Joseph L. DeRisi, Vishwanath R. Iyer, Patrick O. Brown* 

DNA microarrays containing virtually every gene of Saccharomyces cerevisiae were used 
to carry out a comprehensive investigation of the temporal program of gene expression 
accompanying the metabolic shift from fermentation to respiration. The expression 
profiles observed for genes with known metabolic functions pointed to features of the 
metabolic reprogramming that occur during the diauxic shift, and the expression patterns 
of many previously uncharacterized genes provided clues to their possible functions. The 
same DNA microarrays were also used to identify genes whose expression was affected 
by deletion of the transcriptional co-repressor TUP1 or overexpression of the transcrip- 
tional activator YAP1. These results demonstrate the feasibility and utility of this ap- 
proach to genomewide exploration of gene expression patterns. 



The complete sequences of nearly a dozen 
microbial genomes are known, and in the 
next several years we expect to know the 
complete genome sequences of several 
metazoans, including the human genome. 
Defining the role of each gene in these 
genomes will be a formidable task, and un- 
derstanding how the genome functions as a 
whole in the complex natural history of a 
living organism presents an even greater 
challenge. 

Knowing when and where a gene is 
expressed often provides a strong clue as to 
its biological role. Conversely, the pattern 
of genes expressed in a cell can provide 
detailed information about its state. Al- 
though regulation of protein abundance in 
a cell is by no means accomplished solely 
by regulation of mRNA, virtually all dif- 
ferences in cell type or state are correlated 
with changes in the mRNA levels of many 
genes. This is fortuitous because the only 
specific reagent required to measure the 
abundance of the mRNA for a specific 
gene is a cDNA sequence. DNA microar- 
rays, consisting of thousands of individual 
gene sequences printed in a high-density 
array on a glass microscope slide (I, 2), 
provide a practical and economical tool 
for studying gene expression on a very 
large scale (3-6). 

Saccharomyces cerevisiae is an especially 

Department of Biochemistry. Stanford University School 
of Medicine. Howard Hughes Medical Institute. Stanford. 
CA 94305-5428. USA^ 

' To whom correspondence should be addressed. E-mail; 
pbrown®cmgm.stanford.edu 



favorable organism in which to conduct a 
systematic investigation of gene expression. 
The genes are easy to recognize in the ge- 
nome sequence, cis regulatory elements are 
generally, compact and close to the tran- 
scription units, much is already known 
about its genetic regulatory mechanisms, 
and a powerful set of tools is available for its 
analysis. 

A recurring cycle in the natural history 
of yeast involves a shift from anaerobic 
(fermentation) to aerobic (respiration) me- 
tabolism. Inoculation of yeast into a medi- 
um rich in sugar is followed by rapid growth 
fueled by fermentation, with the production 
of ethanol. When the fermentable sugar is 
exhausted, the yeast cells turn to ethanol as 
a carbon source for aerobic growth. This 
switch from anaerobic growth to aerobic 
respiration upon depletion of glucose, re- 
ferred to as the diauxic shift, is correlated 
with widespread changes in the expression 
of genes involved in fundamental cellular 
processes such as carbon metabolism, pro- 
tein synthesis, and carbohydrate storage 
(7). We used DNA microarrays to charac- 
terize the changes in gene expression that 
take place during this process for nearly the 
entire genome, and to investigate the ge- 
netic circuitry that regulates and executes 
this program. 

Yeast open reading frames (ORFs) were 
amplified by the polymerase chain reaction 
(PCR), with a commercially available set of 
primer pairs (8). DNA microarrays, con- 
taining approximately 6400 distinct DNA 
sequences, were printed onto glass slides by 



using a simple robotic printing device (9). 
Cells from an exponentially growing culture 
of yeast were inoculated into fresh medium 
and grown at 30°C for 21 hours. After an 
initial 9 hours of growth, samples were har- 
vested at seven successive 2-hour intervals, 
and mRNA was isolated (10). Fluorescently 
labeled cDNA was prepared by reverse tran- 
scription in the presence of Cy3(green)- 
or Cy5(red)-labeled deoxyuridine triphos- 
phate (dUTP) (II) and then hybridized to 
the microarrays {12}. To maximize the re- 
liability with which changes in expression 
levels could be discerned, we labeled cDNA 
prepared from cells at each successive time 
point with Cy5, then mixed it with a Cy3- 
labeled "reference" cDNA sample prepared 
from cells harvested at the first interval 
after inoculation. In this experimental de- 
sign, the relative fluorescence intensity 
measured for the Cy3 and Cy5 fluors at 
each array element provides a reliable mea- 
sure of the relative abundance of the corre- 
sponding mRNA in the two cell popula- 
tions (Fig. 1). Data from the series of seven 
samples (Fig. 2), consisting of more than 
43,000 expression-ratio measurements, 
were organized into a database to facilitate 
efficient exploration and analysis of the 
results. This database is publicly available 
on the Internet (13). 

During exponential growth in glucose- 
rich medium, the global pattern of gene 
expression was remarkably stable. Indeed, 
when gene expression patterns between the 
first two cell samples (harvested at a 2-hour 
interval) were compared, mRNA levels dif- 
fered by a factor of 2 or more for only 19 
genes (0.3%), and the largest of these dif- 
ferences was only 2 .7-fold (14). However, as 
glucose was progressively depleted from the 
growth media during the course of the ex- 
periment, a marked change was seen in the 
global pattern of gene expression. mRNA 
levels for approximately 710 genes were 
induced by a factor of at least 2, and the 
mRNA levels for approximately 1030 genes 
declined by a factor of at least 2. Messenger 
RNA levels for 183 genes increased by a 
factor of at least 4, and mRNA levels for 
203 genes diminished by a factor of at least 
4- About half of these differentially ex- 
pressed genes have no currently recognized 
function and are not yet named. Indeed, 
more than 400 of the differentially ex- 
pressed genes have no apparent homology 
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to: any gene whose function is known' (15) 
The responses of these previously unchar- 
acterized genes to the diauxic shift therefore 
provides the first small clue to their possible 
roles. 

The global view of changes in expres- 
sion of genes with known functions pro- 
vides a vivid picture of the way in which 
the cell adapts to a changing environ- 
ment. Figure 3 shows a portion of the yeast 
metabolic pathways involved in carbon 
and energy metabolism. Mapping the 
changes we observed in the mRNAs en- 
coding each enzyme onto this framework 
allowed us to infer the redirection in the 
flow of metabolites through this system. 
We observed large inductions of the genes 
coding for the enzymes aldehyde dehydro- 
genase (ALD2) and acetyl-coenzyme 
A(CoA) synthase (ACS1), which func- 
tion together to convert the products of 
alcohol dehydrogenase into acetyl-CoA, 
which in turn is used to fuel the tricarbox- 
ylic acid (TCA) cycle and the glyoxylate 
cycle. The concomitant shutdown of tran- 
scription of the genes encoding pyruvate 
decarboxylase and induction of pyruvate 
carboxylase rechannels pyruvate away 
from acetaldehyde, and instead to oxalac- 
etate, where it can serve to supply the 
TCA cycle and gluconeogenesis. Induc- 
tion of the pivotal genes PCK1, encoding 
phosphoenolpyruvate carboxykinase, and 
FBPI, encoding fructose 1,6-biphos- 
phatase, switches the directions of two key 
irreversible steps in glycolysis, reversing 
the flow of metabolites along the revers- 
ible steps of the glycolytic pathway toward 
the essential biosynthetic precursor, glu- 
coses-phosphate. Induction of the genes 
coding for the trehalose synthase and gly- 
cogen synthase complexes promotes chan- 
neling of glucose-6-phosphate into these 
carbohydrate storage pathways. 

Just as the changes in expression of 
genes encoding pivotal enzymes can pro- 
vide insight into metabolic reprogram- 
ming, the behavior of large groups of func- 
tionally related genes can provide a broad 
view of the systematic way in which the 
yeast cell adapts to a changing environ- 
ment (Fig. 4). Several classes of genes, 
such as cytochrome c-related genes and 
those involved in the TCA/glyoxylate cy- 
cle and carbohydrate storage, were coordi- 
nately induced by glucose exhaustion. In 
contrast, genes devoted to protein synthe- 
sis, including ribosomal proteins, tRNA 
synthetases, and translation, elongation, 
and initiation factors, exhibited a coordi- 
nated decrease in expression. More than 
95% of ribosomal genes showed at least 
twofold decreases in expression during the 
diauxic shift (Fig. 4) (13). A noteworthy 
and illuminating exception was that the 



genes encoding mitochondrial ribdsbmaU 
genes were generally induced rather than 
repressed after glucose limitation, high- 
lighting the requirement for mitchondrial 
biogenesis (13). As more is learned about 
the functions of every gene in the yeast 
genome, the ability to gain insight into a 
cell's response to a changing environment 
through its global gene expression patterns 
will become increasingly powerful. 

Several distinct temporal patterns of ex- 
pression could be recognized, and sets of 
genes could be grouped on the basis of the 
similarities in their expression patterns. The 
characterized members of each of these 
groups also shared important similarities in 
their functions. Moreover, in most cases, 
common regulatory mechanisms could be 
inferred for sets of genes with similar expres- 
sion profiles. For example, seven genes 
showed a late induction profile, with mRNA 
levels increasing by more than ninefold at 



the last timepoint but less than threefold at 
■'. the preceding • timepoint ( Fig. 5 B). All of 
these genes were known to be glucose-re- 
pressed, and five of the seven were previously 
noted to share a common upstream activat- 
ing sequence (UAS), the carbon source re- 
sponse element (CSRE) (16-20). A search 
in the promoter regions of the remaining two 
genes, ACR1 and IDP2, revealed that 
ACR1, a gene essential for ACS1 activity, 
also possessed a consensus CSRE motif, but 
interestingly, IDP2 did not. A search of the 
entire yeast genome sequence for the con- 
sensus CSRE motif revealed only four addi- 
tional candidate genes, none of which 
showed a similar induction. 

Examples from additional groups of 
genes that shared expression profiles are 
illustrated in Fig. 5, C through F. The 
sequences upstream of the named genes in 
Fig. 5C all contain stress response ele- 
ments (STRE), and with the exception 




Fig. 1. Yeast genome microarray. The actual size of the microarray is 18 mm by 18 mm. The 
microarray was printed as described (9). This image was obtained with the same fluorescent 
scanning confocal microscope used to collect all the data we report (49). A fluorescently labeled 
cDNA probe was prepared from mRNA isolated from cells harvested shortly after inoculation (culture 
density of <5 x 10 s cells/ml and media glucose level of 19 g/liter) by reverse transcription in the 
presence of Cy3-dUTP. Similarly, a second probe was prepared from mRNA isolated from cells taken 
from the same culture 9.5 hours later (culture density of —2 x 10 s cells/ml, with a glucose level of 
<0.2 g/liter) by reverse transcription in the presence of Cy5-dUTP. In this image, hybridization of the 
Cy3-dUTP-labeled cDNA (that is, mRNA expression at the initial timepoint) is represented as a green 
signal, and hybridization of Cy5-dUTP-labeled cDNA (that is, mRNA expression at 9.5 hours) is 
represented as a red signal. Thus, genes induced or repressed after the diauxic shift appear in this 
image as red and green spots, respectively. Genes expressed at roughly equal levels before and after 
the diauxic shift appear in this image as yellow spots. 
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" of HSP42: have ^previously been'^showri' to ' 
be ; conttglled at least . part by. these) 
elements (21-24). inspection of the se- 
quences upstream of HSP42 and the two 
uncharacterized genes shown in Fig. 5C, 
YKL026c, a hypothetical protein with 
similarity to glutathione peroxidase, and 
YGR043c, a putative transaldolase, re- 
vealed that each of these genes also pos- 
sess repeated upstream copies of the stress- 
responsive CCCCT motif. Of the 13 ad- 
ditional genes in the yeast genome that 
shared this expression profile [including 
HSP30, ALD2, OM45, and 10 uncharac- 
terized ORFs (25)], nine contained one or 
more recognizable STRE sites in their up- 
stream regions. 

The heterotrimeric transcriptional acti- 
vator complex HAP2,3,4 has been shown 
to be responsible for induction of several 
genes important for respiration (26-28). 
This complex binds a degenerate consensus 
sequence known as the CCAAT box (26). 
Computer analysis, using the consensus se- 
quence TNRYTGGB (29), has suggested 
that a large number of genes involved in 
respiration may be specific targets of 
HAP2,3,4 (30). Indeed, a putative 
HAP2,3,4 binding site could be found in 
the sequences upstream of each of the seven 
cytochrome c-related genes that showed 
the greatest magnitude of induction (Fig. 
5D). Of 12 additional cytochrome c-related 
genes that were induced, HAP2,3,4 binding 
sites were present in all but one. Signifi- 
cantly, we found that transcription of 
HAP4 itself was induced nearly ninefold 
concomitant with the diauxic shift. 

Control of ribosomal protein biogenesis 
is mainly exerted at the transcriptional 
level, through the presence of a common 
upstream-activating element (UAS ) 
that is recognized by the Rapl DNA-bind- 
ing protein (31 , 32). The expression pro- 
files of seven ribosomal proteins are shown 
in Fig. 5F. A search of the sequences 
upstream of all seven genes revealed con- 
sensus Rapl-binding motifs (33). It has 
been suggested that declining Rapl levels 
in the cell during starvation may be re- 
sponsible for the decline in ribosomal pro- 
tein gene expression (34). Indeed, we ob- 
served that the abundance of RAP1 
mRNA diminished by 4.4-fold, at about 
the time of glucose exhaustion. 

Of the 149 genes that encode known or 
putative transcription factors, only two, 
HAP4 and S/P4, were induced by a factor of 
more than threefold at the diauxic shift. 
SIP4 encodes a DNA-binding transcrip- 
tional activator that has been shown to 
interact with Snfl, the "master regulator" of 
glucose repression (35). The eightfold in- 
duction of S/P4 upon depletion of glucose 
strongly suggests a role in the induction of 
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'downstream-genes at the diauxic shift "'*'" 
v Although most of the transcriptional 
responses that we observed were not pre- 
viously known, the responses of many 
genes during the diauxic shift have been 
described. Comparison of the results we 
obtained by DNA microarray hybridiza- 
tion with previously reported results there- 
fore provided a strong test of the sensitiv- 
ity and accuracy of this approach. The 
expression patterns we observed for previ- 
ously characterized genes showed almost 
perfect concordance with previously pub- 
lished results (36). Moreover, the differ- 
ential expression measurements obtained 
by DNA microarray hybridization were re- 
producible in duplicate experiments. For 
example, the remarkable changes in gene 
expression between cells harvested imme- 
diately after inoculation and immediately 
after the diauxic shift (the first and sixth 
intervals in this time series) were mea- 
sured in duplicate, independent DNA mi- 
croarray hybridizations. The correlation 
coefficient for two complete sets of expres- 
sion ratio measurements was 0.87, and for 
more than 95% of the genes, the expres- 
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sion ratios measured in these duplicate 
^experiments differed by less than a factor 
of 2. However, in a few cases, there were 
discrepancies between our results and pre- 
vious results, pointing to technical limita- 
tions that will need to be addressed as 
DNA microarray technology advances 
(37, 38). Despite the noted exceptions, 
the high concordance between the results 
we obtained in these experiments and 
those of previous studies provides confi- 
dence in the reliability and thoroughness 
of the survey. 

The changes in gene expression during 
this diauxic shift are complex and involve 
integration of many kinds of information 
about the nutritional and metabolic state 
of the cell. The large number of genes 
whose expression is altered and the diver- 
sity of temporal expression profiles ob- 
served in this experiment highlight the 
challenge of understanding the underlying 
regulatory mechanisms. One approach to 
defining the contributions of individual 
regulatory genes to a complex program of 
this kind is to use DNA microarrays to 
identify genes whose expression is affected 



Fig. 2. The section of the ar- 
ray indicated by the gray box 
in Fig. 1 is shown for each of 
the experiments described 
here. Representative genes 
are labeled. In each of the ar- 
rays used to analyze gene 
expression during the diauxic 
shift, red spots represent 
genes that were induced rel- 
ative to the initial timepoint, 
and green spots represent 
genes that were repressed 
relative to the initial timepoint. 
In the arrays used to analyze 
the effects of the tupld. mu- 
tation and YAP1 overexpres- 
sion, red spots represent 
genes whose expression was 
increased, and green spots 
represent genes whose ex- 
pression was decreased by 
the genetic modification. Note 
that distinct sets of genes are 
induced and repressed in the 
different experiments. The 
complete images of each of 
these arrays can be viewed on 
the Internet (73). Cell density 
as measured by optical densi- 
ty (OD) at 600 nm was used to 
measure the growth of the 
culture. 
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by mutations in each putative regulatory 
gene As a test of this strategy,' we analyzed- 
the genomewide changes in gene expression 
that result from deletion of the TUPl gene. 
Transcriptional repression of many genes by 
glucose requires the DNA-binding repressor 



Migl-and is mediated By recruiting the tranf; 
' scriptional .co-repressors Tupl arid :• G;yc8/o 
Ssn6 (39). Tupl has also been implicated in 
repression of oxygen-regulated, mating-type- 
specific, and DNA-damage-inducible genes 
(40). 
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Fig. 3. Metabolic reprogramming inferred from global analysis of changes in gene expression. Only key 
metabolic intermediates are identified. The yeast genes encoding the enzymes that catalyze each step 
in this metabolic circuit are identified by name in the boxes. The genes encoding succinyl-CoA synthase 
and glycogen-debranching enzyme have not been explicitly identified, but the ORFs YGR244 and 
YPR184 show significant homology to known succinyl-CoA synthase and glycogen-debranching en- 
zymes, respectively, and are therefore included in the corresponding steps in this figure. Red boxes with 
white lettering identify genes whose expression increases in the diauxic shift. Green boxes with dark 
green lettering identify genes whose expression diminishes in the diauxic shift. The magnitude of 
induction or repression is indicated for these genes. For multimeric enzyme complexes, such as 
succinate dehydrogenase, the indicated fold-induction represents an unweighted average of all the 
genes listed in the box. Black and white boxes indicate no significant differential expression (less than 
twofold). The direction of the arrows connecting reversible enzymatic steps indicate the direction of the 
flow of metabolic intermediates, inferred from the gene expression pattern, after the diauxic shift. Arrows 
representing steps catalyzed by genes whose expression was strongly induced are highlighted in red. 
The broad gray arrows represent major increases in the flow of metabolites after the diauxic shift, 
inferred from the indicated changes in gene expression. 



Wild-type yeast cells' and cells bearing 
a deletion 6f:the TUPl gene (tupIA) were 
grown in parallel cultures in rich medium 
containing glucose as the carbon source. 
Messenger RNA was isolated from expo- 
nentially growing cells from the two pop- 
ulations and used to prepare cDNA la- 
beled with Cy3 (green) and Cy5 (ted), 
respectively (II). The labeled probes were 
mixed and simultaneously hybridized to 
the microarray. Red spots on the microar- 
ray therefore represented genes whose 
transcription was induced in the tupl A 
strain, and thus presumably repressed by 
Tupl (41 )- A representative section of the 
microarray (Fig. 2, bottom middle panel) 
illustrates that the genes whose expression 
was affected by the tupl A mutation, were, 
in general, distinct from those induced 
upon glucose exhaustion [complete images 
of all the arrays shown in Fig. 2 are avail- 
able on the Internet (13)]. Nevertheless, 
34 (10%) of the genes that were induced 
by a factor of at least 2 after the diauxic 
shift were similarly induced by deletion of 
TUP] , suggesting that these genes may be 
subject to TUP 1 -mediated repression by 
glucose. For example, SUC2, the gene en- 
coding invertase, and all five hexose trans- 
porter genes that were induced during the 
course of the diauxic shift were similarly 
induced, in duplicate experiments, by the 
deletion of TUPl. 

The set of genes affected by Tupl in this 
experiment also included a-glucosidases, 
the mating-type-specific genes MFA1 and 
MFA2, and the DNA damage-inducible 
RNR2 and RNR4, as well as genes involved 
in flocculation and many genes of unknown 
function. The hybridization signal corre- 
sponding to expression of TUPl itself was 
also severely reduced because of the (in- 
complete) deletion of the transcription unit 
in the tupl A strain, providing a positive 
control in the experiment (42). 

Many of the transcriptional targets of 
Tupl fell into sets of genes with related 
biochemical functions. For instance, al- 
though only about 3% of all yeast genes 
appeared to be TUPI -repressed by a factor 
of more than 2 in duplicate experiments 
under these conditions, 6 of the 13 genes 
that have been implicated in flocculation 
(15) showed a reproducible increase in 
expression of at least twofold when TUPl 
was deleted. Another group of related 
genes that appeared to be subject to TUPl 
repression encodes the serine-rich cell 
wall mannoproteins, such as Tipl and 
Tirl/Srpl which are induced by cold 
shock and other stresses (43), and similar, 
serine-poor proteins, the seripauperins 
(44). Messenger RNA levels for 23 of the 
26 genes in this group were reproducibly 
elevated by at least 2.5-fold in the tupl A 
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'strain, and 18 of these geries'were induced 
by mofe than sevenfold., wh'en-TUPl was 
deleted. In contrast, none of 83 genes that 
could be classified as putative regulators of 
the cell division cycle were induced more 
than twofold by deletion of TUP1. Thus, 
despite the diversity of the regulatory sys- 
tems that employ Tupl, most of the genes 
that it regulates under these conditions 
fall into a limited number of distinct func- 
tional classes. 

Because the microarray allows us to 
monitor expression of nearly every gene in 
yeast, we can, in principle, use this ap- 
proach to identify all the transcriptional 
targets of a regulatory protein like Tupl. It 
is important to note, however, that in any 
single experiment of this kind we can only 
recognize those target genes that are nor- 
mally repressed (or induced) under the 
conditions of the experiment. For in- 
stance, the experiment described here an- 
alyzed a MAT a strain in which MFAf 
and MFA2, the genes encoding the a- 
factor mating pheromone precursor, are 
normally repressed. In the isogenic tupJA 
strain, these genes were inappropriately 
expressed, reflecting the role that Tupl 
plays in their repression. Had we instead 
carried out this experiment with a MATA 
strain (in which expression of MFAI and 
MFA2 is not repressed), it would not have 
been possible to conclude anything re- 
garding the role of Tupl in the repression 
of these genes. Conversely, we cannot dis- 
tinguish indirect effects of the chronic 
absence of Tupl in the mutant strain from 
effects directly attributable to its partici- 
pation in repressing the transcription of a 
gene. 

Another simple route to modulating the 
activity of a regulatory factor is to overex- 
press the gene that encodes it. YAP J en- 
codes a DNA-binding transcription factor 
belonging to the b-zip class of DNA-bind- 
ing proteins. Overexpression of YAPI in 
yeast confers increased resistance to hydro- 
gen peroxide, o-phenanthroline, heavy 
metals, and osmotic stress (45). We ana- 
lyzed differential gene expression between a 
wild-type strain bearing a control plasmid 
and a strain with a plasmid expressing YAPi 
under the control of the strong GAL1-10 
promoter, both grown in galactose (that is, 
a condition that induces YAP] overexpres- 
sion). Complementary DNA from the con- 
trol and YAPI overexpressing strains, la- 
beled with Cy3 and Cy5, respectively, was 
prepared from mRNA isolated from the two 
strains and hybridized to the microarray. 
Thus, red spots on the array represent genes 
that were induced in the strain overexpress- 
ing YAPI. 

Of the 17 genes whose mRNA levels 
increased by more than threefold when 



YAPI was overexpressed in this way, five 
bear homology to aryl-alcohol oxidoreduc-. 
tases (Fig. 2 and Table 1). An additional 
four of the genes in this set also belong to 
the general class of dehydrogenases/oxi- 
doreductases. Very little is known about 
the role of aryl-alcohol oxidoreductases in 
S. cerevisiae, but these enzymes have been 
isolated from ligninolytic fungi, in which 
they participate in coupled redox reac- 
tions, oxidizing aromatic, and aliphatic 
unsaturated alcohols to aldehydes with the 
production of hydrogen peroxide (46, 47). 
The fact that a remarkable fraction of the 
targets identified in this experiment be- 
long to the same small, functional group of 
oxidoreductases suggests that these genes 

Fig. 4. Coordinated reg- 
ulation of functionally re- 
lated genes. The curves 
represent the average in- 
duction or repression ra- 
tios for all the genes in 
each indicated group. 
The total number of 
genes in each group was 
as follows: ribosomal 
proteins, 112; translation 
elongation and initiation 

factors, 25; tRNA synthetases (excluding mitochondial synthetases). 17; glycogen and trehalose syn- 
thesis and degradation, 15; cytochrome c oxidase and reductase proteins, 19; and TCA- and glyoxy- 
late-cycle enzymes, 24. 

Table 1 . Genes induced by YAP1 overexpression. This list includes all the genes for which mRNA levels 
increased by more than twofold upon YAP1 overexpression in both of two duplicate experiments, and 
for which the average increase in mRNA level in the two experiments was greater than threefold (50). 
Positions of the canonical Yap1 binding sites upstream of the start codon, when present, and the 
average fold-increase in mRNA levels measured in the two experiments are indicated. 



might play ah important, protective role 
during oxidative stress. .Transcription of a 
small number of genes was reduced in the 
strain overexpressing Yapl. Interestingly, 
many of these genes encode sugar per- 
meases or enzymes involved in inositol 
metabolism. 

We searched for Yapl-binding sites 
(TTACTAA or TGACTAA) in the se- 
quences upstream of the target genes we 
identified (48). About two-thirds of the 
genes that were induced by more than 
threefold upon Yapl overexpression had 
one or more binding sites within 600 bases 
upstream of the start codon (Table 1), sug- 
gesting that they are directly regulated by 
Yapl. The absence of canonical Yapl-bind- 
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Aminotriazole and 4-nitroquinoline 
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resistance protein 


6.1 


YCLX08C 






Hypothetical protein 
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YJR155W 
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YPL171C 


148, 212 


OYE3 


NAPDH dehydrogenase (old yellow 
enzyme), isoform 3 


5.8 


YLR460C 
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Homology to hypothetical proteins 
YCR102c and YNL134c 
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YMR251W 
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327 
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NAD(P)H oxidoreductase (old yellow 
enzyme), isoform 1 
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homolog 


3.7 


YOL126C 




MDH2 


Malate dehydrogenase 


3.3 



684 



SCIENCE • VOL. 278 - 24 OCTOBER 1997 • www.sciencemag.org 



'irig'sitesupstream of.tjie dthers'may reflect 
ah; ability, qf Yapl jo . bind sites' that differ 
from the canonical binding sites, perhaps in 
cooperation with other factors, or less like- 
ly, may represent an indirect effect of Yapl 
overexpression, mediated by one or more 
intermediary factors. Yapl sites were found 
only four times in the corresponding region 
of an arbitrary set of 30 genes that were not 
differentially regulated by Yapl. 

Use of a DNA microarray to character- 
ize the transcriptional consequences of 
mutations affecting the activity of regula- 
tory molecules provides a simple and pow- 
erful approach to dissection and character- 
ization of regulatory pathways and net- 



works This strategy also>haS ari l'mportartt ' 
practicalv...applicatio'n in drug screening 
Mutations in specific genes encoding can- 
didate drug targets can serve as surrogates 
for the ideal chemical inhibitor or modu- 
lator of their activity. DNA microarrays 
can be used to define the resulting signa- 
ture pattern of alterations in gene expres- 
sion, and then subsequently used in an 
assay to screen for compounds that repro- 
duce the desired signature pattern. 

DNA microarrays provide a simple and 
economical way to explore gene expres- 
sion patterns on a genomic scale. The 
hurdles to extending this approach to any 
other organism are minor. The equipment 




19 21 











— O- CYT1 >^ 






—•-COX5A \ 






— o— COXB \ 






— ^COX13 \ 






— o— H1P1 \ 






-"O- QCR7 \ . 






^>-COHt y 






i 1 1 1 1 





20 
15 ♦ 



10S 



11 13 15 17 19 21 



15 ♦ 



103 



13 15 17 19 21 



Time (hours) 

Fig. 5. Distinct temporal patterns of induction or repression help to group genes that share regulatory 
properties. (A) Temporal profile of the cell density, as measured by OD at 600 nm and glucose 
concentration in the media. (B) Seven genes exhibited a strong induction (greater than ninefold) only at 
the last timepoint (20.5 hours). With the exception of IDP2, each of these genes has a CSRE UAS. There 
were no additional genes observed to match this profile. (C) Seven members of a class of genes marked 
by early induction with a peak in mRNA levels at 18.5 hours. Each of these genes contain STRE motif 
repeats in their upstream promoter regions. (D) Cytochrome c oxidase and ubiquinol cytochrome c 
reductase genes. Marked by an induction coincident with the diauxic shift, each of these genes contains 
a consensus binding motif for the HAP2,3,4 protein complex. At least 17 genes shared a similar 
expression profile. (E) SAM7, GPP1, and several genes of unknown function are repressed before the 
diauxic shift, and continue to be repressed upon entry into stationary phase. (F) Ribosomal protein 
genes comprise a large class of genes that are repressed upon depletion of glucose. Each of the genes 
profiled here contains one or more RAP1 -binding motifs upstream of its promoter. RAP1 is a transcrip- 
tional regulator of most ribosomal proteins. 



'required- for fabricating and: usihg' DNA 
rnicroarrays (9) . consists, of components 
that were chosen for their modest cost and 
simplicity. It was feasible for a small group 
to accomplish the amplification of more 
than 6000 genes in about 4 months and, 
once the amplified gene sequences were in 
hand, only 2 days were required to print a 
set of 110 microarrays of 6400 elements 
each. Probe preparation, hybridization, 
and fluorescent imaging are also simple 
procedures. Even conceptually simple ex- 
periments, as we described here, can yield 
vast amounts of information. The value of 
the information from each experiment of 
this kind will progressively increase as 
more is learned about the functions of 
each gene and as additional experiments 
define the global changes in gene expres- 
sion in diverse other natural processes and 
genetic perturbations. Perhaps the greatest 
challenge now is to develop efficient 
methods for organizing, distributing, inter- 
preting, and extracting insights from the 
large volumes of data these experiments 
will provide. 
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' ti6n-,.the'bound DNA'was denatured by a 2 r it»n in 
, v cubation in distilled water. at ~95°C.'The slides were. 
" ihentransferredlnto'abathof 1 00% ethanbl at rdbrrf 
temperature, rinsed, and then spun dry in a clnical 
centrifuge. Slides were stored in a closed box at 
room temperature until used. 
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vessel, was inoculated with 2 ml of a fresh over- 
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We describe here a method for drug target validation and identification of secondary drug tar- 
get effects based on genome-wide gene expression patterns. The method is demonstrated by 
several experiments, including treatment of yeast mutant strains defective in calcineurin, im- 
munophilins or other genes with the immunosuppressants cyclosporin A or FK506. Presence or 
absence of the characteristic drug 'signature' pattern of altered gene expression in drug-treated 
cells with a mutation in the gene encoding a putative target established whether that target was 
required to generate the drug signature. Drug dependent effects were seen in 'targetless' cells, 
showing that FK506 affects additional pathways independent of calcineurin and the im- 
munophilins. The described method permits the direct confirmation of drug targets and recog- 
nition of drug-dependent changes in gene expression that are modulated through pathways 
distinct from the drug's intended target. Such a method may prove useful in improving the effi- 
ciency of drug development programs. 
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Good drugs are potent and specific: that is, they must have 
strong effects on a specific biological pathway and minimal ef- 
fects on all other pathways. Confirmation that a compound in- 
hibits the intended target (drug target validation) and the 
identification of undesirable secondary effects are among the 
main challenges in developing new drugs. Comprehensive 
methods that enable researchers to determine which genes or 
activities are affected by a given drug might improve the effi- 
ciency of the drug discovery process by quickly identifying po- 
tential protein targets, or by accelerating the Identification of 
compounds likely to be toxic. DNA microarray technology, 
which permits simultaneous measurement of the expression 
levels of thousands of genes, provides a comprehensive frame- 
work to determine how a compound affects cellular metabolism 
and regulation on a genomic scale 1 "". DNA microarrays that 
contain essentially every open reading frame (ORF) in the 
Saccharomyces cerevisiae genome have already been used success- 
fully to explore the changes in gene expression that accompany 
large changes in cellular metabolism or cell cycle progression 1 " 10 . 

In the modern drug discovery paradigm, which typically be- 
gins with the selection of a single molecular target, the ideal in- 
hibitory drug is one that inhibits a single gene product so 
completely and so specifically that it is as if the gene product 
were absent. Treating cells with such a drug should induce 
changes in gene expression very similar to those resulting from 
deleting the gene encoding the drug's target. Here we have com- 
pared the genome-wide effects on gene expression that result 
from deletions of various genes in the budding yeast 5. cerevisiae 
to the effects on gene expression that result from treatment 



with known inhibitors of those gene products. Using the cal- 
cineurin signaling pathway as a model system, we tested an ap- 
proach that permits identification of genes that encode proteins 
specifically involved in pathways affected by a drug. The FK506 
characteristic pattern, or 'signature', of altered gene expression 
was not observed in mutant cells lacking proteins inhibited by 
FK506 (for example, a calcineurin or FK506-binding-protein 
mutant strain), but was observed in mutants deleted for genes 
in pathways unrelated to FK506 action (for example, a cy- 
clophilin mutant strain). Conversely, the cyclosporin A (CsA) 
signature was not observed in CsA-treated calcineurin or cy- 
clophilin mutant strains, but was seen in an FK506-binding-pro- 
tein mutant strain treated with CsA. The method also 
demonstrates that FK506, a clinically used immunosuppressant, 
has 'off-target' effects that are independent of its binding to im- 
munophilins. Thus, the approach we describe may provide a 
way to identify the pathways altered by a drug and to detect 
drug effects mediated through unintended targets. 

Null mutants phenocopy drug-treated cells on a genomic scale 
To test whether a null mutation in a drug target serves as a 
model of an ideal inhibitory drug, we examined the effects on 
gene expression associated with pharmacological or genetic in- 
hibition of calcineurin function. Calcineurin is a highly con- 
served calcium- and calmodulin-activated serine/threonine 
protein phosphatase implicated in diverse processes dependent 
on calcium signaling 12 " 13 . In budding yeast, calcineurin is re- 
quired for intracellular ion homeostasis' 4 , for adaptation to pro- 
longed mating pheromone treatment 15 and in the regulation of 
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Fig!,1 Model of antagonism of the calcineurin signaling pathway mediated 
by FKS06 and cyclosporin A (CsA). Calcineurin activity is composed of a cat 
alytic subunit (calcineurin A, encoded in yeast by the CNA 7 and CNA2 genes), 
and calcium-binding regulatory subunits calmodulin (CMD) and calcineurin B 
(CnB). After entering cells, FK506 and CsA specifically bind and inhibit the 
peptidyl-proline isomerase activity of their respective immunophilins, FK506 
binding proteins (FKBP) and cyclophilins (CyP). The most abundant im- 
munophilins in yeast (Fprl and Cph1 ) are thought to mediate calcineurin in- 
hibition. Drug-immunophilin complexes bind and inhibit the calcium- and 
calmodulin-stimulated phosphatase calcineurin. Among the substrates of cal- 
cineurin are transcriptional activators that act to modulate gene expression. 



the onset of mitosis". In mammals, calcineurin has been impli- 
cated in T-cell activation 12 , in apoptosis", in cardiac hypertro- 
phy" and in the transition from short-term to long-term 
memory' 8 . In both organisms, calcineurin activity is inhibited 
by FK506 and CsA, immunosuppressant drugs whose effects on 
calcineurin are mediated through families of intracellular recep- 
tor proteins called immunophilins' 2 20 (Fig. 1). To assess the ef- 
fects of pharmacologic inhibition of calcineurin, wild-type S. 
cerevisiae was grown to early logarithmic phase in the presence 
or absence of FK506 or CsA. Isogenic cells, from which the 
genes encoding the catalytic subunits of calcineurin (CNA1 and 
CNAZ) had been deleted 21 (referred to as the cna or calcineurin 
mutant), were grown in parallel, in the absence of the drug. 
Fluorescently-labeled cDNA was prepared by reverse transcrip- 
tion of polyA* RNA in the presence of Cy3- or Cy5-deoxynu- 
cleotide triphosphates and then hybridized to a microarray 
containing more than 6,000 DNA probes representing 97% of 
the known or predicted ORFs in the yeast genome. 
Simultaneous hybridization of Cy5-labeled cDNA from mock- 
treated cells and Cy3-labeled cDNA from cells treated with 1 
ug/ml FK506 allowed the effect of drug treatment on mRNA lev- 
els of each ORF to be determined (Fig. 2a and b and data not 
shown). Similarly, effects of the calcineurin mutations on the 
mRNA levels of each gene were assessed by simultaneous hy- 
bridization of Cy5-labeled cDNA from wild-type cells and Cy3- 
labeled cDNA from the calcineurin mutant strain(Fig. 2c). For 
each comparison of this kind, reported expression ratios are the 
average of at least two hybridizations in which the Cy3 and Cy5 
fluors were reversed to remove biases that may be introduced by 
gene-specific differences in incorporation of the two fluors 
(data not shown). 

Treatment with FK506 in these growth conditions resulted in 
a signature pattern of altered gene expression in which mRNA 
levels of 36 ORFs changed by more than twofold 
(http://www.rosetta.org). A very similar pattern of altered gene 
expression was observed when the calcineurin mutant strain 
was compared to wild-type cells. Comparison of the changes in 
mRNA expression of each gene resulting from treatment of 
wild-type cells with FK506 with mRNA expression changes re- 
sulting from deletion of the calcineurin genes showed the con- 
siderable similarity of the global transcript alterations in 
response to the two perturbations (Fig. 26-d). Quantification of 
this similarity using the correlation coefficient (p) showed 
large correlations between the FK506 treatment signature and 
the calcineurin deletion signature (p = 0.75 ± 0.03), as well as 
the CsA treatment signature (p = 0.94±0.02), but not with a 
randomly selected deletion mutant strain (deleted for the 
v Ei?07JC gene: p = -0.07 ± 0.04; Fig. 2e). The FK506 treatment 
signature was also compared with those of more than 40 other 
deletion mutant strains or drug-treatments thought to affect 
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unrelated pathways, and none had statistically significant cor- 
relations. These data establish that genetic disruption of cal- 
cineurin function provides a close and specific phenocopy of 
treatment with FK506 or CsA. 

To avoid generalizing from a single example, we also com- 
pared the effects of treatment of wild-type cells with 3-aminotri- 
azole (3-AT) with the effects of deletion of the HIS3 gene. HIS3 
encodes imidazoleglycerol phosphate dehydratase, which cat- 
alyzes the seventh step of the histidine biosynthetic pathway in 
yeast 22 ; 3-AT is a competitive inhibitor of this enzyme that trig- 
gers a large transcriptional amino-acid starvation response 23 . 
Microarray analysis of wild-type and isogenic /j/s3-deficient 
strains demonstrated the expected large genome- wide transcrip- 
tional responses (involving more than 1,000 ORFs) resulting 
from treatment with 3-AT (Fig. 3a) or from HIS3 deletion (Fig. 
3c). Quantitative comparison of the 3-AT treatment signature 
and the his3 mutant signature showed a high level of correlation 
(p= 0.76 ± 0.02) that even extended to genes that experienced 
small changes in expression level (Fig. 3b). As a negative control, 
the correlations between the 3-AT treatment signature or the 
his3 mutant signature and the calcineurin mutant strain were 
not statistically significant (p = 0.09 ± 0.06 and -0.01 ± 0.04, re- 
spectively). That both the calcineurin/FK506 and the his3/3-M 
comparisons were highly correlated indicates that in many cases 
the expression profile resulting from a gene deletion closely re- 
sembles the expression profile of wild-type cells treated with an 
inhibitor of that gene's product. 

'Decoder' strategy: Drug target validation with deletion mutants 

Because pharmacological inhibition of different targets might 
give similar or identical expression profiles, simple comparison 
of drug signatures to mutant signatures is unlikely to unambigu- 
ously identify a drug's target. To overcome this limitation, an 
additional 'decoder' step is used. We first compare the expres- 
sion profile of wild-type drug-treated cells to the expression pro- 
files from a panel of genetic mutant strains, using a correlation 
coefficient metric. Mutant strains whose expression profile is 
similar to that of drug- treated wild-type cells are selected and 
subjected to drug treatment, generating the drug signature in 
the mutant strain (that is, the mutant drug signature). If the 
mutated gene encodes a protein involved in a pathway affected 
by the drug, we expect the drug signature in mutant cells to be 
different (or absent, for an ideal drug) from the drug signature 
seen in wild-type cells. 
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'.Fig.;* . Expression profiles 'from 

';FK506-treated . -wild-type: (wt) 
cells and a calcineurin-disruption 
mutant strain share a genome- 
wide correlation. DIM A microarray 
analysis showing changes in gene 
expression resulting from FK506 
treatment (a and ft) or from ge- 
netic disruption of genes encod- 
ing calcineurin (c). a. Pseudo- 
color image of the results of si- 
multaneous hybridization of Cy5- 
labeled cDNA (red) from 
mock-treated strain R563 and Cy3-labeled cDNA 
(green) from strain R563 treated with 1 u.g/ml FK506. 
b, Enlarged view of the boxed area in a. Arrowheads in- 
dicate specific ORFs induced or repressed, e. Pseudo- 
color image of the results of simultaneous hybridization 
of Cy5-labeled cDNA (red) from strain R563 and Cy3- 
labeled cDNA (green) from strain MCY300 (deleted for 
the CNA1.CNA2 catalytic subunits of calcineurin). 
Arrows indicate specific ORFs induced or repressed, d. 
The log,,) of the expression ratio for each ORF derived 
from the FK506 treatment hybridizations is plotted ver- 
sus the log, 0 of the expression ratio in the calcineurin 
mutant hybridizations. ORFs that were induced or re- 
pressed in both experiments are shown as green and 
red dots, respectively, e. The log, 0 of the expression ratio for each ORF de 
rived from the FK506 treatment hybridizations is plotted versus the log,, 



wivs. calcincrurin mutant 




log,„ (R/G) calcineurin mutation 



Log, 0 (R/G) yer071c mutation 



of the expression ratio in the yer071c mutant hybridizations. No ORFs 
were induced or repressed in both experiments. 



To illustrate this, we treated the his3 mutant strain with 3- 
AT. The signature pattern of altered gene expression resulting 
from treatment of the mutant strain with 3-AT was much less 
complex than that of the 3-AT signature in wild-type cells (Fig. 
4). This is seen simply by examining plots of mean intensity of 
the hybridization signal (which approximately reflects level of 
expression) versus the expression ratio for each ORF (Fig. 4). 
Genes that were expressed at higher or lower levels in 3-AT 
treated cells or in his3 mutant cells are shown as red and green 
dots, respectively. We analyzed the 3-AT signature in wild-type 
(Fig. 4a) and his3 mutant cells (Fig. 4c), as well as the his3 mu- 
tant strain signature (Fig. 4b). Whereas histidine limitation in- 
duced by 3-AT induced more than 1,000 transcription-level 
changes in the wild-type strain, few or no transcript level 
changes were induced by treatment of the MsJ-deletion strain 
with 3-AT. This indicates that with the growth conditions used, 
essentially all of the effects of 3-AT depend on or are mediated 
through the HIS3 gene product. 

Applying this approach to the calcineurin signaling pathway 
showed the specificity of the method. The calcineurin mutant 
strain and strains with deletions in the genes encoding the 
most abundant immunophilins in yeast 12 (CPH1 and FPR1) 
were treated with either FK506 or CsA to determine the profiles 



Table 1 Signature correlation of expression ratios as a result of FK506 
treatment in various mutant strains 
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0.93 ± 0.04 


-0.01 ±0.07 


-0.23 ± 0.07 


0.12 ±0.07 


0.79 ± 0.03 



Signature correlation shows the absence of the FKS06 signature specifically in the calcineurin (cna) and fprl 
(major FK506 binding protein) deletion mutants, cna represents the mutant with deletions of the catalytic sub- 
units of calcineurin, CNA1 and CNA2. The correlation coefficient reported in the first column represents the cor- 
relation between two pairs of hybridizations from independent wild-type +/- FK506 experiments. 



of altered gene expression resulting from drug treatment of the 
mutant cells (that is, mutant +/- drug). We compared the drug 
signatures in the mutants to the wild-type drug signature using 
the correlation coefficient metric (Table 1). Although the signa- 
ture generated by treatment of wild-type cells with FK506 was 
highly correlated to the calcineurin mutant strain signature (p 
= 0.75 ± 0.03), it bore no similarity to the profile after treat- 
ment of the calcineurin mutant strain with FK506 (p = -0.01 ± 
0.07). This indicates that FK506 was unable to elicit its normal 
transcriptional response in the . calcineurin mutant strain. 
Likewise, treatment of the fprl mutant strain with FK506 
elicited an expression profile that was not correlated to the 
FK506 signature in the wild-type strain (p = -0.23 ± 0.07), indi- 
cating that the FPR1 gene product is likely to be involved in the 
pathway affected by FK506. The same was true for the cna fprl 
mutant strain. In contrast, treatment of the cphl mutant strain 
with FK506 generated an expression profile highly correlated 
with the wild-type FK506 expression profile (p = 0.79 ± 0.03), 
indicating the cphl mutation did not block the mode of action 
of FK506 and thus is not directly involved in the pathway af- 
fected by FK506. We tabulated the change in expression in re- 
sponse to FK506 in different mutant strains for all ORFs with 
expression ratios greater than 1 .8 in FK506-treated cells or in 
the calcineurin mutant strain (Fig. 5a).The 
calcineurin mutant strain signature and the 
FK506 responses in wild-type and the cphl 
mutant strain are similar, and there are no 
transcript-level changes (seen in black) for 
treatment of the calcineurin, fprl and cna 
fprl mutant strains with FK506 (Fig. 5a). 

Similar experiments and analyses with CsA 
provided further validation of this approach. 
The expression profile elicited by treatment 
of wild-type cells with CsA was highly corre- 
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wt -/VIOmM 3-AT 



Fig. . 3 Expression .profiles • 
from; a .his3 '* mutant • strain - 
and wild-type (wt) celis 
treated with 3-AT share a 
genome-wide correlation. 
DNA microarray analysis 
showing changes in gene 
expression resulting from 3- 
AT treatment (a) or from ge- 
netic disruption of the HIS3 
gene (c). a. Pseudo-color 
image of the results of simul- 
taneous hybridization of 

Cy5-labeled cDNA (red) from mock-treated wild-type strain R491 and 
Cy3-labeled cDNA (green) from strain R491 treated with 10 mM 3-AT. 
to. Plot of the log, 0 of the expression ratio for each ORF derived from the 
3-AT treatment hybridizations is plotted versus the log,„ of the expression 
ratio in the his3 mutant hybridizations. ORFs that were induced or re- 
pressed in both experiments are shown as green and red dots, respec- 
tively. The correlation of expression ratios applies not only to genes with 
large expression ratios (for example, CHA 7 and ARG1). but also extends to 
genes with expression ratios less than 2 (for example, ILV1 and CPH1). 
ILV1 is induced 1.9-fold and 1.5-fold, and CPH1 is downregulated 1.9-fold 



wt vs. his3 mutation 




' His.i * 




log,,, (R/G) his3 mutation 



and 1 .7-fold, in cells treated with 3-AT and his3 mutant cells, respectively. 
Two ORFs do not fall on the line x = y. The leftmost point is the HIS3 data 
point, which is induced by 3-AT treatment but which is not absent from 
the his3 mutant strain. The other point is YOR203W. Both data points are 
labeled HIS3 because hybridization to YOR203w is most likely due to HIS3 
mRNA, as YOR203W overlaps the HIS3 open reading frame, e. Pseudo- 
color image of the results of simultaneous hybridization of CyS-labeled 
cDNA (red) from wild-type strain R491 and Cy3-labeled cDNA (green) 
from strain R1226. deleted for the HIS3 gene. Arrowheads indicate spe- 
cific ORFs induced or repressed. 



lated to the profile elicited by mutation of the calcineurin genes 
(p = 0.71 ± 0.04). but did not correlate with the expression pro- 
file resulting from treatment of the calcineurin mutant strain 
with CsA (p = -0.05 ± 0.07: Table 2), indicating that the genetic 
deletion of calcineurin interfered with the ability of CsA to 
elicit its normal transcriptional response. Likewise, the CsA sig- 
nature was essentially absent in CsA-treated cphl mutant cells, 
and the expression profile of CsA-treated cphl mutant cells cor- 
related poorly to that of CsA-treated wild-type cells (p = 0.18 + 
0.07). Thus, the CPH1 gene product was required for the CsA re- 
sponse seen in wild-type cells. Conversely, treatment of fprl 
mutant cells with CsA resulted in an expression pattern very 
similar to the profile of CsA-treated wild-type cells (p = 0.77 ± 
0.03), indicating that FPR1 was not necessary for the CsA-medi- 
ated effects. Analysis of individual ORFs affected by CsA and 
their expression ratios over the entire set of experiments con- 
firmed that CPH1 and the genes encoding calcineurin, but not 



FPRl. are necessary for the wild-type CsA response (Fig. 56). The 
observation that the profiles resulting from FK506 or CsA drug 
treatment are similar to that of the calcineurin deletion mutant 
strain might allow the prediction that calcineurin was involved 
in the pathway affected by these drugs. But because the expres- 
sion profile of the fprl mutant strain did not bear a strong simi- 
larity to the wild-type drug expression profile for FK506, it is 
obvious that the drug treatment of the mutant strains was nec- 
essary to identify Fprl, but not Cphl, as a potential FK506 drug 
target. In the same way, the 'decoder' strategy was necessary to 
identify Cphl, but not Fprl, as a potential drug target for CsA. 

'Decoder* approach can identify secondary drug effects 
For a drug that has a single biochemical target, the strategy out- 
lined above may be useful in target validation. In many cases, 
however, a compound may affect multiple pathways and elicit 
a very complex signature. 'Decoding' such a complex signature 



wt lOmM 3-AT 



wt vs. Ns3 mutant 



his3 mutant 10 mM 3-AT 




Log, 0 (intensity) 



Fig. 4 Treatment of the his3 mutant strain with 3-AT shows nearly com- 
plete loss of 3-AT signature. A plot of the log,„ of the mean intensity of hy- 
bridization for each ORF versus the log, 0 of its expression ratio for each 
experiment is shown next to a pseudo-color image of a representative 
portion of the microarray. ORFs that are induced or repressed at the 95% 
confidence level are shown in green and red, respectively, a, Expression 
profile from treatment of the wild-type (wt) strain with 3-AT. Cy5-labeled 
cDNA (red) from mock-treated strain R491 and Cy3-labeled cDNA 
(green) from strain R491 treated with 10 mM 3-AT. to. Expression profile 




OS 

Log,,, (intensity) 



from the his3 deletion strain. Cy5-labeled cDNA (red) from strain R491 
and Cy3-labeled cDNA (green) from strain R1226, deleted for the HIS3 
gene, c. Expression profile of treatment of the his3 deletion strain with 3- 
AT. Cy3-labeled cDNA (red) from h/s3-deleted strain R1 226 and CyS-la- 
beled cDNA (green) rrom strain R1226 treated with 10 mM 3-AT. 
Arrowheads indicate the DNA probe and data point corresponding to the 
HIS3 gene. The blue dashed line represents the threshold below which er- 
rors tend to increase rapidly because spot intensities are not sufficiently 
above background intensity. 
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Table 2 Signature correlation of expression ratios as a result of CsA- 
treatment in various mutant strains ' ' " : ■ " 





wild-type 


cna 


fprl 


cna cphl 


cphl 




+/-CsA 


+/-CsA 


+/-CSA 


+/-CsA 


+/-CsA 


wild-type 












+/- CsA 


0.94 ± 0.04 


-0.05 ± .07 


0.77 ±0.03 


-0.11 ±0.07 


0.18 ±0.07 „ 



Strain: 



Signature correlation shows the absence of the CsA signature specifically in the calcineurin (cna) and cpn7 
(cyclophilin) deletion mutants, cna represents the mutant with deletions of the catalytic subunits of cal- 
cineurin. CNA 1 and CNA2. The correlation coefficient reported in the first column represents the correlation 
between two pairs of hybridizations from independent wild-type +/- CsA experiments. 



into the effects mediated through the intended target (the 'on- 
target signature') and those mediated through unintended tar- 
gets (the 'off-target' signature) might be useful in evaluating a 
compound's specificity. Our 'decoder' strategy is based on the 
premise that 'off-target' signature should be insensitive to the 
genetic disruption of the primary target. 

To determine whether the 'decoder' approach could identify 
an 'off-target' profile, we looked for a drug-responsive gene 
whose expression is insensitive to deletion of the primary tar- 
get. To increase the likelihood of observing such genes, the 
same strains described in Tables 1 and 2 were treated with 
higher concentrations (50 ug/ml) of FK506. This led to a much 
more complex expression profile in wild-type cells, indicating 
that at this higher concentration, FK506 was inhibiting or acti- 
vating additional targets. Several of the ORFs in this expanded 
FK506-induced expression profile were not affected by the cal- 
cineurin, cphl or fprl mutations, as drug treatment of these mu- 
tant strains did not block their presence in the FK506 
expression signature (Fig. 6). This indicates that FK506 was trig- 
gering changes in transcript levels of many genes through path- 
ways independent of calcineurin, CPH1 and FPRl. Many of the 
upregulated ORFs in the 'off-target' pathway were genes re- 
ported to be regulated by the transcriptional activator Gcn4 
(ref. 24). In some strains, a reporter gene under GCN4 control 
was induced in response to FK506 treatment 25 . To determine 
whether GCN4 is involved in this pathway that is independent 
of calcineurin, CPH1 and FPRl, we analyzed the effects of treat- 
ment with high-dose FK506 on global gene expression in a 
strain with a GCN4 deletion (Fig. 6). Of the 41 ORFs with cal- 
cineurin-independent expression ratios greater than 4, 32 were 
not induced in the gcn4 mutant, indicating that their induction 
by FK506 was GCA/4-dependent. Not all CC/V4-regulated genes 
were induced by FK506. This FK506-induced subset of GCN4- 
regulated genes may be those most sensitive to subtle changes 
in Gcn4 levels, or perhaps other regulatory circuits prevent 
FK506 activation of some GCA/4-regulated genes. Seven of the 
remaining nine ORFs induced by FK506 were independent of 

Fig. 5 Response of FK506 and CsA signature genes in strains with deletions 
in different genes. Genes with expression ratios greater than a factor of 1 .8 in 
response to treatment with 1 ng/ml FK506 (a) or 50 ug/ml CsA (b) are listed 
(left side) and their expression ratios in the indicated strain are shown on the 
green (induction)-red (repression) color scale, a, Calcineurin (cna) mutant 
and FK506 treatment signature genes are in the first two columns. Almost all 
FK506 signature genes have expression ratios near unity in deletion strains 
involved in pathways affected by FK506 (calcineurin, fprl and cna fprl mu- 
tants) but not in deletion strains in unrelated pathways {cphl). b, Calcineurin 
(cna) mutant and CsA treatment signature genes are in the first two 
columns. Almost all CsA signature genes have expression ratios near unity in 
deletion strains involved in pathways affected by CsA (calcineurin. cphl and 
cna cphl mutants) but not in deletion strains in unrelated pathways (fprl). 



both the calcineurin and GCN4 pathways. The 
simplest. explanation is that FK506 inhibits or 
activates additional pathways. Members of this 
class include SNQ2 and PDR5, genes that en- 
code drug efflux pumps with structural homol- 
ogy to mammalian multiple drug resistance 
proteins 20 . FK506 may interact directly with 
Pdr5 to inhibit its function 27 . Our results indi- 
cate that treatment with FK506 leads to four- 
fold-to-sixfold induction of PDR5 mRNA levels. 
YOR1, another gene that can confer drug resis- 
tance, is also induced threefold-to-fourfold by 
FK506. Thus, drug treatment of strains with mutations in the 
primary targets can prove useful in identifying effects mediated 
by secondary drug targets, including the nature and extent of 
newly discovered and previously unsuspected pathways af- 
fected by the drug. 

We describe here a method for drug target validation and the 
identification of secondary drug target effects that uses DNA mi- 
croarrays to survey the effects of drugs on global gene expres- 
sion patterns. We established that genetic and pharmacologic 
inhibition of gene function can result in extremely similar 
changes in gene expression. We also demonstrated that one can 
confirm a potential drug target by treating a deletion mutant 
defective in the gene encoding the putative target. Drug-medi- 
ated signatures from strains with mutations in pathways or 
processes directly or indirectly affected by the drug bore little or 



cphl 



fprl 



cna cna fprl 




Fold repression 



Fold induction 



NATURE MEDICINE - VOLUME 4 • NUMBER 11 - NOVEMBER 1 998 



1297 



ARTICLES 



1998 Nature America Inc. • http://medicine.nature.com 



FK506: 

CAM-dop«nd»nt 

VBBO05W 
Yumiav 

CMK7 

acn4-atpmtMi 

UETlt 
CPAi 

You>ec 
no*e 
ybrwtw 

SSUI 
lOPt 
AROi 
NJTt 
YMM1 
YGOOcC 
HAD! 
CPA2 
ADHS 
YGlCSBW 
VJR130C 
AH03 
MJS3 
PCLS 

was 

SNOt 
VHH029C 

yshutw 

AH01 
MIS4 

YGtiiW 
S«i 
AAG&.S 
ARG4 
YP1250C 
YER17SC 

CNA-U16 OCTW- tnd»p«nd«nt 

UHAi 
HSCfi? 
SSA2 
QCV? 

SNQZ 
HISS 

Complex behavior class 
YEfWZiw _M| 
YWflCBSW 



' 'pr' 

+ 



,;cpfi; 
+ 



. cnalpr1_ . ' /gen4 
+ + 




Fold repression 



Fold induction 



no similarity to the wild-type drug expression profile. In con- 
trast, drug-mediated signatures from strains with mutations in 
genes involved in pathways unrelated to the drug's action 
showed extensive similarity to the wild-type drug signature. By 
applying this approach to a drug that affects multiple pathways 
(FK506). we were able to decode a complex signature into com- 
ponent parts, including the identification of an 'off-target' sig- 
nature that was mediated through pathways independent of 
calcineurin or the Fprl immunophilin. 

Discussion 

It is well-established that high-throughput biochemical screen- 
ing can identify potent inhibitory compounds against a given 
target. The 'decoder' approach described here complements 
this process by evaluating the equally important property of 
specificity: the tendency of a compound to inhibit pathways 
other than that of its intended target. The ability to observe 
such 'off-target' effects will likely be useful in several ways. 
Profiling compounds with known toxicities will allow the de- 
velopment of a database of expression changes associated with 
particular toxicities. Recognition of potential toxicities in the 
'off-target' signatures of otherwise promising compounds then 
may allow earlier identification of those likely to fail in clinical 
trials. Comparing the extent and peculiarities of 'off-target' sig- 
natures of promising drug candiates could provide a new way 
to group compounds by their effects on secondary pathways, 
even before those effects are understood. This may prove to be 
an alternative, potentially more effective, way to select com- 
pounds for animal and clinical trials. Some drugs are more ef- 
fective against a related protein than against the originally 
intended target. Sildenafil (Viagra™), for example, was initially 
developed as a phosphodiesterase inhibitor to control cardiac 
contractility, but was found to be highly specific for phospho- 
diesterase 5, an isozyme whose inhibition overcomes defects in 



Fig. 6 Response of FK506 signature genes in strains with deletions 
iri different genes': Genes with expression ratios greater than a factor 
of 4 in at least one experiment are listed and their expression ratios in 
the indicated strain are shown in the green (induction)-red (repres- 
sion) color scale. The genes have been divided into classes corre- 
sponding to these expected behaviors: ' CAM-dependent' genes 
respond to FK506 (50 ug/ml) except when either calcineurin genes or 
FPR1 or both are deleted; 'GCN4-dependent' genes respond to FK506 
except when GCN4 is deleted. These genes still respond to FK506 
when calcineurin genes or FPR1 or CPH1 are deleted; that is, their re- 
sponses are not mediated by calcineurin, Cph1, or Fprl. 'CNA and 
GC/V4-independent' genes respond to FK506 in all deletion strains 
tested. A 'complex behavior' class is provided for those genes that did 
not match the model of FKS06 response mediated through cal- 
cineurin or Fprl or separately through Gcn4. 



penile erection. It is possible that application of the 'de- 
coder' to other compounds may show that they too have a 
potent activity against a target distinct from their in- 
tended target. 

The ability to decode drug effects is dependent on the 
availability of functionally 'targetless' cells. In yeast, this 
is being achieved by systematically disrupting each yeast 
gene (Saccharomyces Deletion Consortium; http://se- 
gj quence-www.stanford.edu/group/yeast_deletion_pro- 

ject/deletion.html). Efforts are underway to obtain 
■■ expression profiles from each deletion mutant strain. 
Determining signatures resulting from inactivation of es- 
sential genes presents a unique problem, but it may be 
possible to do so by examining heterozygotes or by using a con- 
trollable promoter to reduce expression of the essential gene. 
Although it is already feasible to test several compounds in 
dozens of yeast strains, another challenge for the 'decoder' 
strategy will be the efficient selection of the mutants with dele- 
tions in genes most likely to encode the intended drug target. 
The signature correlation plots described are one metric that 
could be used as part of that selection process, but others need 
to be explored. Applying the 'decoder' to mammalian cells pre- 
sents additional challenges. It is considerably more difficult to 
isolate functionally 'targetless' cells. Strategies involving titrat- 
able promoters, known specific inhibitors, anti-sense RNAs, ri- 
bozymes, and methods of targeting specific proteins for 
degradation are possible and should be tested. Another limita- 
tion is that not all cell types express the same set of genes and 
therefore 'off-target' effects may be different in different cell 
types. In addition, applying the decoder' to human cells will 
also require technical improvements that allow expression pro- 
filing from a small number of cells. Even the broader question 
of whether the insensitivity of 'off-target' signatures to the dis- 
ruption of the main target is the exception or the rule can only 
be answered by the accumulation of more data. Barkai and 
Leibler, however, have argued in favor of robustness of biologi- 
cal networks, indicating that drug perturbations ('off-target' 
signatures) may be robust even when the system is subjected to 
another perturbation (such as a genetic disruption) (ref. 28). 
Many practical developments will be necessary if the 'decoder' 
concept is to be broadly applied. 

Expression arrays have been used mainly as an initial screen 
for genes induced in a particular tissue or process of interest by 
focusing on genes with large expression ratios. We have 
found, however, that effort to refine experimental protocols 
and repeat experiments increases the reliability of the data and 
permits new applications. For example, it provides a larger set 
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Table 3 Yeast strains used 



Strain 


Relevant genotype 


Reference 


YPH499 


Mata ura3-S2 lys2-801 ade2-101 trpl-A63 his3-A200 Ieu2-A1 


(34) 


R563 


Mata ura3-52 Iys2-801 ade2-101 trp1-A63 his3-A200 Ieu2-A1 his3::HIS3 


(this study) 


R558 


Mata ura3-52 Iys2-801 ade2-101 trpl-663 his3-A200 Ieu2-A1 fprl::HIS3 


(this study) 


R567 


Mata ura3-52 Iys2-801 ade2-101 trpl-A63 his3-A200 Ieu2-Al cpM::HIS3 


(this study) 


MCY300 


Mata ura3-52 Iys2-801 ade2-101 trpl-A63 his3-A200 Ieu2-A1 cnalAt ::hisG cna2A1 ::HIS3 


(21) 


R132 


Mata ura3 S2 Iys2-801 ade2-101 trpl -A63 his3-A200 Ieu2-A1 cna1A1::hisG cna2AT::HIS3 cph1::karf 


(this study) 


R133 


Mata ura3-52 Iys2-8(tf ade2-101 trp1-A63 his3-A200 Ieu2-A1 cna1Al::hisG cna2A1::HIS3 fpr1::karf 


(this study) 


R559 


Mata ura3-S2 Iys2-801 ade2-101 trpl -A63 his3-A200 Ieu2-Al his3::HIS3 gcn4::LEU2 


(this study) 


BY4719 


Mata trpl-A63 ura3-A0 


(35) 


BY4738 


Mata trpl-A 63 ura3-A0 


(35) 


R491 


Mata/oBY4719 XBY4738 


(this study) 


BY4728 


Mata his3-A200 trpl-A63 ura3-A0 


(35) 


BY4729 


Mata his3-A 200 trpl-A63 ura3-A0 


(35) 


R1226 


Mata/aBY4728 XBY4729 


(this study) 



of genes at higher confidence levels that serve as a more 
unique signature for a given protein perturbation. In addition, 
E it allows subtle signatures to be detected, when, for example, a 
8 protein is only partially inhibited. This may enable clinical 
£ monitoring of small changes in protein function in disease or 
H toxicity states before they could otherwise be detected. 
<S Because the functions of many genes detected on transcript ar- 
'3 rays are known, these microarrays are powerful tools that pro- 
|j vide detailed information about a cell's physiology. For 
?? example, changes in the flux through a metabolic pathway are 
g reflected in transcriptional changes in genes in the pathway 7 . 
*. Furthermore, it may be possible to indirectly measure protein 
B activity levels from expression profiling data (S.F., et al., un- 
S published data). Thus, although the eventual development of 
jj| genomic methods allowing the direct measurement of all cel- 
< hilar protein levels will be an important achievement, tran- 
§ script array technology offers an immediate and robust means 
J? of evaluating the effects of various treatments on gene expres- 
oo sion and protein function. 
o> 

8 Methods 

Construction, growth and drug treatment of yeast strains. The strains 
used in this study (Table 3) were constructed by standard techniques 15 . 
To construct strain R559, strain R563 was transformed to Leu* with plas- 
mid pM12 digested by Sail and M/ul (provided by A. Hinnebusch and T. 
Dever). Strains R1 32 and R1 33 were constructed by transforming the bac- 
terial kanamycin resistance cassette 30 flanked by genomic DNA from the 
CPH1 and FPR1 loci, respectively, and selecting for G418-resistant 
colonies. For experiments with FK506, cells were grown for three genera- 
tions to a density of 1 x 10' cells/ml in YAPD medium (YPD plus 0.004% 
adenine) supplemented with 10 mM calcium chloride as described 3 '. 
Where indicated, FK506 was added to a final concentration of 1 fig/ml 
0.5 h after inoculation of the culture or to 50 ng/ml 1 h before cells were 
collected. CsA was used at a final concentration of 50 ug/ml. Cells were 
broken by standard procedures" with the following modifications: Cell 
pellets were resuspended in breaking buffer (0.2 M Tris HCI pH 7.6, 0.5 M 
NaCI. 10 mM EDTA, 1% SDS), vortexed for 2 min on a VWR multi-tube 
vortexer at setting 8 in the presence of 60% glass beads (425-600 nm 
mesh; Sigma) and phenolxhloroform (50:50, volume/volume). After sep- 
aration of the phases, the aqueous phase was re-extracted and ethanol- 
precipitated. Poly A' RNA was isolated by two sequential 
chromatographic purifications over oligo dT cellulose (New England 
Biolabs, Beverly, Massachusetts) using established protocols 3 '. 

For experiments using 3-AT. wild-type or his3/his3 cells were grown to 
early logarithmic phase in SC medium, pelleted and resuspended in SC 
medium lacking histidine for 1 hr in the presence or absence of 10 mM 3- 



AT. as indicated. Cells were harvested and mRNA isolated as above. 
FK506 was obtained from the Swedish Hospital Pharmacy (Seattle, 
Washington) and purified to homogeneity by ethyl acetate extraction by 
J. Simon (Fred Hutchinson Cancer Research Center, Seattle, Washington). 
CsA was obtained from Alexis Biochemicals (San Diego. California); 3-AT 
was from Sigma. 

Preparation and hybridization of the labeled sample. Fluorescently-la- 
beled cDNA was prepared, purified and hybridized essentially as de- 
scribed'. Cy3- or Cy5-dUTP (Amersham) was incorporated into cDNA 
during reverse transcription (Superscript II; Life Technologies) and puri- 
fied by concentrating to less than 1 0 ptl using Microcon-30 microconcen- 
trators (Amicon. Houston, Texas). Paired cDNAs were resuspended in 
20-26 |il hybridization solution (3 x SSC, 0.75 ng/ml polyA DNA, 0.2% 
SDS) and applied to the microarray under a 22- x 30-mm coverslip for 6 
h at 63 *C, all according to a published method'. 

Fabrication and scanning of microarrays. PCR products containing 
common 5' and 3' sequences (Research Genetics, Huntsville. Alabama) 
were used as templates with amino-modified forward primer and unmod- 
ified reverse primers to PCR amplify 6,065 ORFs from the S. cerevisiae 
genome. Our first-pass success rate was 94%. Amplification reactions that 
gave products of unexpected sizes were excluded from subsequent analy- 
sis. ORFs that could not be amplified from purchased templates were am- 
plified from genomic DNA. DNA samples from 100-|il reactions were 
isopropanol-precipitated, resuspended in water, brought to a final con- 
centration of 3x SSC in a total volume of 15 -il. and transferred to 384- 
well microtiter plates (Genetix Limited, Christchurch. Dorset, England). 
PCR products were spotted onto 1 x 3-inch polylysine-treated glass slides 
by a robot built essentially according to defined specifications 3 
(http://cmgm.stanford.edu/pbrown/MGuide). After being printed, slides 
were processed according to published protocols'. 

Microarrays were imaged on a prototype multi-frame CCD camera in 
development at Applied Precision (Issaquah, Washington). Each CCD 
image frame was approximately 2-mm square. Exposure times of 2 s in 
the Cy5 channel (white light through Chroma 618-648 nm excitation fil- 
ter. Chroma 657-727 nm emission filter) and 1 s in the Cy3 channel 
(Chroma 535-560 nm excitation filter. Chroma 570-620 nm emission fil- 
ter) were done consecutively in each frame before moving to the next, 
spatially contiguous frame. Color isolation between the Cy3 and Cy5 
channels was about 100:1 or better. Frames were 'knitted' together in 
software to make the complete images. The intensity of spots (about 100 
(im) were quantified from the 10-nm pixels by frame-by-frame back- 
ground subtraction and intensity averaging in each channel. Dynamic 
range of the resulting spot intensities was typically a ratio of 1 .000 be- 
tween the brightest spots and the background-subtracted additive error 
level. Normalization between the channels was accomplished by normal- 
izing each channel to the mean intensities of all genes. This procedure is 
nearly equivalent to normalization between channels using the intensity 
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ratio of genomic DNA spots', but is possibly more robust, as it is based on 
. the intensities of several thousand spots distributed over the array. - 

Signature correlation coefficients and their confidence limits. 
Correlation coefficients between the signature ORFs of various experi- 
ments were calculated using: 

p = Ixwy, / (Zx, J £ y, 2 )" 1 
k k k 

where x, is the log, 0 of the expression ratio for the k* gene in the x signa- 
ture, and y k is the log, 0 of the expression ratio for the k* gene in the y sig- 
nature. The summation is over those genes that were either up- or 
down-regulated in either experiment at the 95% confidence level. These 
genes each had a less than 5% chance of being actually unregulated (hav- 
ing expression ratios departing from unity due to measurement errors 
alone). This confidence level was assigned based on an error model which 
assigns a lognormal probability distribution to each gene's expression 
ratio with characteristic width based on the observed scatter in its re- 
peated measurements (repeated arrays at the same nominal experimental 
conditions) and on the individual array hybridization quality. This latter 
dependence was derived from control experiments in which both Cy3 
and CyS samples were derived from the same RNA sample. For large 
numbers of repeated measurements the error reduces to the observed 
scatter. For a single measurement the error is based on the array quality 
and the spot intensity. 

Random measurement errors in the x and y signatures tend to bias the 
correlation towards zero. In most experiments, most genes are not signif- 
icantly affected but do show small random measurement errors. Selecting 
only the '95% confidence' genes for the correlation calculation, rather 
than the entire genome, reduces this bias and makes the actual biological 
correlations more apparent. 

Correlations between a profile and itself are unity by definition. Error 
limits on the correlation are 95% confidence limits based on the individ- 
ual measurement error bars, and assuming uncorrelated errors 51 . They do 
not include the bias mentioned above: thus, a departure of p from unity 
does not necessarily mean that the underlying biological correlation is im- 
perfect. However, a correlation of 0.7 ± 0.1. for example, is very signifi- 
cantly different from zero. Small (magnitude of p < 0.2) but formally 
significant correlation in the tables and text probably are due to small sys- 
tematic biases in the Cy5/Cy3 ratios that violate the assumption of inde- 
pendent measurement errors used to generate the 95% confidence 
limits. Therefore, these small correlation values should be treated as not 
significant. A likely source of uncorrected systematic bias is the partially 
corrected scanner detector nonlinearity that differently affects the Cy3 
and Cy5 detection channels. 

The 1 ng/ml FK506 treatment signature was compared with more 
than 40 unrelated deletion mutant strain or drug signatures. These con- 
trol profiles had correlation coefficients with the FK506 profile that were 
distributed around zero (mean p = -0.03) with a standard deviation of 
0.16 (data not shown), and none had correlations greater than p = 0.38. 
Similarly, the calcineurin mutant strain signature correlated well with the 
CsA treatment signature (p = 0.71 ± 0.04) but not with the signatures 
from the negative controls (mean p = -0.02 with a standard deviation of 
0.18). 

Quality controls. End-to-end checks on expression ratio measurement 
accuracy were provided by analyzing the variance in repeated hybridiza- 
tions using the same mRNA labeled with both Cy3 and Cy5, and also 
using Cy3 and Cy5 mRNA samples isolated from independent cultures of 
the same nominal strain and conditions. Biases undetected with this pro- 
cedure, such as gene-specific biases presumably due to differential incor- 
poration of Cy3- and Cy5-dUTP into cDNA, were minimized by doing 
hybridizations in fluor-reversed pairs, in which the Cy3/Cy5 labeling of 
the biological conditions was reversed in one experiment with respect to 
the other. The expression ratio for each gene is then the ratio of ratios be- 
tween the two experiments in the pair. Other biases are removed by algo- 
rithmic numerical de-trending. The magnitude of these biases in the 
absence of de-trending and fluor reversal is typically about 30% in the 
ratio, but may be as high as twofold for some ORFs. 

Expression ratios are based on mean intensities over each spot. Some 



smaller spots have fewer image pixels*. in the average. This does riot de- 
grade accuracy noticeably until the number of pixels falls below ten, in 
which case the spot is rejected from the data set. 'Wander' of spot posi- 
tions with respect to the nominal grid is adaptively tracked in array sub- 
regions by the image processing software. Unequal spot 'wander' within 
a subregion greater than half-a-spot spacing is a difficulty for the auto- 
mated quantitating algorithms; in this case, the spot is rejected from 
analysis based on human inspection of the wander'. Any spots partially 
overlapping are excluded from the data set. Less than 1% of spots typi- 
cally are rejected for these reasons. 
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The temporal program of gene expression during a model physiological re- 
sponse of human cells, the response of fibroblasts to serum, was explored with 
a complementary DNA microarray representing about 8600 different human 
genes. Genes could be clustered into groups on the basis of their temporal 
patterns of expression in this program. Many features of the transcriptional 
program appeared to be related to the physiology of wound repair, suggesting 
that fibroblasts play a larger and richer role in this complex multicellular 
response than had previously been appreciated. 



The response of mammalian fibroblasts to 
serum has been used as a model for studying 
growth control and cell cycle progression (/). 
Normal human fibroblasts require growth 
factors for proliferation in culture; these 
growth factors are usually provided by fetal 
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bovine serum (FBS). In the absence of 
growth factors, fibroblasts enter a nondivid- 
ing state, termed G 0 , characterized by low 



metabolic activity. Addition of FBS or puri- 
fied growth factors induces proliferation of 
the fibroblasts; the changes in gene expres- 
sion that accompany this proliferative re- 
sponse have been the subject of many studies, 
and the responses of dozens of genes to se- 
rum have been characterized. 

We took a fresh look at the response of 
human fibroblasts to serum, using cDNA mi- 
croarrays representing about 8600 distinct hu- 
man genes to observe the temporal program of 
transcription that underlies this response. Pri- 
mary cultured fibroblasts from human neonatal 
foreskin were induced to enter a quiescent state 
by serum deprivation for 48 hours and then 
stimulated by addition of medium containing 
10% FBS (2). DNA microarray hybridization 
was used to measure the temporal changes in 
mRNA levels of 8613 human genes (5) at 12 
times, ranging from 15 min to 24 hours after 
serum stimulation. The cDNA made from pu- 
rified mRNA from each sample was labeled 
with the fluorescent dye Cy5 and mixed with a 
common reference probe consisting of cDNA 
made from purified mRNA from the quiescent 



Fig. 1. The same section of 
the miaoarray is shown 
for three independent hy- 
bridizations comparing RNA 
isolated at the 8-hour time 
point after serum treat- 
ment to RNA from serum- 
deprived cells. Each mi- 
aoarray contained 9996 
elements, including 9804 
human cDNAs, represent- 
ing 8613 different genes. 
mRNA from serum-de- 
prived celts was used to 
prepare cDNA labeled with 
Cy3-deoxyuridine triphosphate (dUTP). and mRNA harvested from cells at different times after serum 
stimulation was used to prepare cDNA labeled with Cy5-dUTP. The two cDNA probes were mixed and 
simultaneously hybridized to the microarray. The image of the subsequent scan shows genes whose 
mRNAs are more abundant in the serum-deprived fibroblasts (that is. suppressed by serum treatment) 
as green spots and genes whose mRNAs are more abundant in the serum-treated fibroblasts as red 
spots. Yellow spots represent genes whose expression does not vary substantially between the two 
samples. The arrows indicate the spots representing the following genes: 1, protein disulfide isomerase- 
related protein P5; 2, IL-8 precursor; 3, EST AA057170; and 4, vascular endothelial growth factor. 
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culture (time zero) labeled with a second fluo- 
rescent dye, Cy3 (4). The color images of the, 
hybridization results (Fig. 1) were made by . 
representing the Cy3 fluorescent image as 
green and the Cy5 fluorescent image as red and 
merging the two color images. 

Diverse temporal profiles of gene expres- 
sion could be seen among the 8613 genes sur- 



veyed in this experiment (Fig: 2); mahyof these 
genes (about half) were unnamed expressed 
sequence tags (ESTs) . (5). Although diverse 
patterns of expression were observed, the order- 
ly choreography of the expression program be- 
came apparent when the results were analyzed 
by a clustering and display method developed 
in our laboratory for analyzing genome-wide 



gene expression data (<5). An example of such 
!an analysis, here applied to, a subset of 517 
genes whose expression changed substantially : 
in response to serum (7), is shown in Fig. 2. 
The entire detailed data set underlying Fig. 
2 is available as a tab-delimited table (in 
cluster order) at the Science Web site (www. 
sciencemag.org/feature/data/984559.shl). In 
addition, the entire, larger data set for the 
complete set of genes analyzed in this exper- 
iment can be found at a Web site maintained 
by our laboratory (genome-www.stanford. 
edu/serum) (8). 

One measure of the reliability of the 
changes we observed is inherent in the ex- 
pression profiles of the genes. For most genes 
whose expression levels changed, we could 
see a gradual change over a few time points, 
which thus effectively provided independent 
measurements for almost all of the observa- 
tions. An additional check was provided by 
the inclusion of duplicate and, in a few cases, 
multiple array elements representing the 
same gene for about 5% of the genes included 
in this microarray. In addition, three indepen- 
dent hybridizations to different microarrays 
with mRNA samples from cells harvested 8 
hours after serum addition showed good cor- 
relation (Fig. 1). As an independent test, we 
measured the expression levels of several 
genes using the TaqMan 5' nuclease fluori- 
genic quantitative polymerase chain reaction 
(PCR) assay (P). The expression profiles of 
the genes, as measured by these two indepen- 
dent methods, were very similar (Fig. 3) (10). 

The transcriptional response of fibroblasts 
to serum was extremely rapid. The immediate 
response to serum stimulation was dominated 
by genes that encode transcription factors 
and other proteins involved in signal trans- 
duction. The mRNAs for several genes [in- 
cluding c-FOS, JUN B, and mitogen-acti- 
vated protein (MAP) kinase phosphatase-1 
(MKP1)] were detectably induced within 
15 min after serum stimulation (Fig. 4, A 
and B). Fifteen of the genes that were 
observed to be induced by serum encode 
known or suspected regulators of transcrip- 
tion (Fig. 4B). All but one were immediate- 
early genes — their induction was not inhib- 
ited by cycloheximide (II). This class of 
genes could be distinguished into those 
whose induction was transient (Fig. 2, clus- 
ter E) and those whose mRNA levels re- 
mained induced for much longer (Fig. 2, 
clusters I and J). Some features of the 
immediate response appeared to be directed 
at adaptation to the initiating signals. We 
observed a marked induction of mRNA 
encoding MKP1, a dual-specificity phos- 
phatase that modulates the activity of the 
ERK1 and ERK.2 MAP kinases (12). The 
coincidence of the peak of expression of 
genes in cluster E (Fig. 2) with that of 
MKP1 (Fig. 4A) suggests the possibility 



Fig. 2. Cluster image 
showing the different 
classes of gene expres- 
sion profiles. Five hun- 
dred seventeen genes 
whose mRNA levels 
changed in response to 
serum stimulation were 
selected (7). This sub- 
set of genes was clus- 
tered hierarchically into 
groups on the basis of 
the similarity of their 
expression profiles by 
the procedure of Eisen 
et a/. (6). The expres- 
sion pattern of each 
gene in this set is dis- 
played here as a hori- 
zontal strip. For each 
gene, the ratio of 
mRNA levels in fibro- 
blasts at the indicat- 
ed time after serum 
stimulation ("unsync" 
denotes exponentially 
growing cells) to its 
level in the serum-de- 
prived (time zero) fi- 
broblasts is represented 
by a color, according to 
the color scale at the 
bottom. The graphs 
show the average ex- 
pression profiles for the 
genes in the corre- 
sponding "cluster" (in- 
dicated by the letters A 
to J and color coding). 
In every case examined, 
when a gene was rep- 
resented by more than 
one array element the 
multiple representa- 
tions in this set were 
seen to have identical 
or very similar expres- 
sion profiles, and the 
profiles corresponding 
to these independent 
measurements clus- 
tered either adjacent 
or very dose to each 
other, pointing to the 
robustness of the clus- 
tering algorithm in 
grouping genes with 
very similar patterns of 
expression. 
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that continued activity of the MAP kinase path- 
way is required to maintain induction of these 
genes but not of those with sustained expression ' 
(clusters I and J). The gene encoding a second 
member of the dual-specificity MAP kinase 
phosphatase family, known as dual-specificity 
protein phosphatase 6/pyst2, was induced later, 
at about 4 hours after serum stimulation. Genes 
encoding diverse other proteins with roles in 
signal transduction, ranging from cell-surface 
receptors [for example, the sphingosine 1- 
phosphate receptor (EDG-1), the vascular en- 
dothelial growth factor receptor, and the type II 
BMP receptor] to regulators of G-protein sig- 
naling (for example, NETl/pl 15 rho GEF) to 
DNA-binding transcription factors, were in- 
duced by serum (Fig. 4A). 

The reprogramming of the regulatory cir- 
cuits in response to serum involved not only 
induction of transcription factors but also re- 
duced expression of many transcriptional reg- 
ulators — some of which may play roles in 
maintaining the cells in G 0 or in priming 
them to react to wounding (Fig. 4C). Perhaps 
as a consequence of the historical focus on 
genes induced by serum stimulation of fibro- 
blasts, the set of transcription factors whose 
expression diminished upon serum stimula- 
tion has been less well characterized. 

Genes known or likely to be involved in 
controlling and mediating the proliferative re- 
sponse showed distinctive patterns of regula- 
tion. Several genes whose products inhibit pro- 
gression of the cell-division cycle, such as p27 
Kipl , p57 Kip2, and pi 8, were expressed in the 
quiescent fibroblasts and down-regulated be- 
fore the onset of cell division. The nadir in the 
mRNA levels for these genes occurred between 
6 and 12 hours after serum stimulation (Fig. 
5A), coincident with the passage of the fibro- 
blasts through G,. The levels of the transcript 
encoding the WEE 1 -like protein kinase, which 
is believed to inhibit mitosis by phosphoryl- 
ation of Cdc2, diminished between 4 and 8 to 
12 hours after serum addition (Fig. 5A), well 



before the onset of M phase at around 16 hours, 
raising . the. possibility of an additional role for. 
Weel in an earlier stage of the'cell cycle or in." 
regulating the G 0 to G, transition. Several 
genes induced in the first few hours after serum 
stimulation, such as the helix-loop-helix pro- 
teins ID2 and ID3 and EST AA016305, a gene 
with homology to G,-S cyclins, are candidates 
for roles in promoting the exit from G 0 . 

Genes involved in mediating progression 
through the cell cycle were characterized by a 
distinctive pattern of expression (Fig. 2, clus- 
ter D), reflecting the coincidence of their 
expression with the reentry of the stimulated 
fibroblasts into the cell-division cycle. The 
stimulated fibroblasts replicated their DNA 
about 16 hours after serum treatment. This 
timing was reflected by the induction of 
mRNA encoding both subunits of ribonucle- 
otide reductase and PCNA, the processivity 
factor for DNA polymerase epsilon and delta. 
Cyclin A, Cyclin Bl, Cdc2, and CDC28 ki- 
nase, regulators of passage through the S 
phase and the transition from G 2 to M phase, 
were induced at about 16 to 20 hours after 
serum addition. The kinase in the Cyclin 
Bl-CDK pair needs to be activated by phos- 
phorylation. The gene encoding Cyclin-de- 
pendent kinase 7 (CDK7; a homolog of Xe- 
nopus MO 15 cdk-activating kinase) was in- 
duced in parallel with the Cdc2 and Cdc28 
kinases (Fig. 5A), suggesting a potential role 
for CDK7 in mediating M phase. DNA topo- 
isomerase II a, required for chromosome seg- 
regation at mitosis; Mad2, a component of 
the spindle checkpoint that prevents comple- 
tion of mitosis (anaphase) if chromosomes 
are not attached to the spindle; and the kinet- 
ochore protein CENP-F all showed a similar 
expression profile. 

In the hours after the serum stimulus, one of 
the most striking features of the unfolding tran- 
scriptional program was the appearance of nu- 
merous genes with known roles in processes 
relevant to the physiology of wound healing. 



These included both genes involved in the di- 
rect role played by fibroblasts in remodeling of : 
the clot arid the extracellular matrix and, more 
notably, genes encoding proteins involved in 
intercellular signaling (Fig. 5). Genes induced 
in this program encode products that can (i) 
participate in the dynamic process of clotting, 
clot dissolution, and remodeling and perhaps 
contribute to hemostasis by promoting local 
vasoconstriction (for example, endothelin-1); 
(ii) promote chemotaxis and activation of neu- 
trophils (for example, COX2) and recruitment 
and extravasation of monocytes and macro- 
phages (for example, MCP1); (iii) promote 
chemotaxis and activation of T lymphocytes 
[for example, interleukin-8 (IL-8)] and B 
lymphocytes (for example, ICAM-1), thus 
providing both innate and antigen-specific 
defenses against wound infection and recruit- 
ing the phagocytic cells that will be required 
to clear out the debris during remodeling of 
the wound; (iv) promote angiogenesis and 
neovascularization (for example, VEGF) 
through newly forming tissue; (v) promote 
migration and proliferation of fibroblasts (for 
example, CTGF) and their differentiation into 
myofibroblasts (for example, Vimentin); and 
(vi) promote migration and proliferation of 
keratinocytes, leading to reepithelialization 
of the wound (for example, FGF7), and pro- 
mote proliferation of melanocytes, perhaps 
contributing to wound hyperpigmentation 
(for example, FGF2). 

Coordinated regulation of groups of genes 
whose products act at different steps in a 
common process was a recurring theme. For 
example, Furin, a prohormone-processing 
protease required for one of the processing 
steps in the generation of active endothelin, 
was induced in parallel with induction of the 
gene encoding the precursor of endothelin-1 
(Fig. 5E) (73). Conversely, expression of 
CALLA/CD10, a membrane metalloprotease 
that degrades endothelin-1 and other peptide 
mediators of acute inflammation, was re- 
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Fig. 3. Independent verification of microarray quantitation. Relative mRNA 
levels of the indicated genes (Mast, mast/stem cell growth factor receptor) 
were measured with the TaqMan 5' nuclease fluorigenic quantitative PCR 
assay (9) (left) in the same samples that were used to prepare probes for 
microarray hybridizations (right). Data from the TaqMan analysis were 
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normalized to mRNA concentrations and plotted relative to the level at 
time zero, so that the results could be compared with those from the 
microarray hybridizations. In general, quantitation with the two methods 
gave very similar results (70). 
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duced A second example ls.provided by a set 
-of five, genes involved in the biosynthesis of 
cholesterol (Fig. 51). The mRNAs encoding 
each of these enzymes showed sharply dimin- 
ished expression beginning 4 to 6 hours after 
serum stimulation of fibroblasts. A likely ex- 
planation for the coordinated down-regula- 
tion of the cholesterol biosynthetic pathway 
is that serum provides cholesterol to fibro- 
blasts through low-density lipoproteins, 
whereas in the absence of the cholesterol 
provided by serum, endogenous cholesterol 
biosynthesis in fibroblasts is required. 

Many of the previously studied genes that 
we observed to be regulated in this program 
have no recognized role in any aspect of wound 
healing or fibroblast proliferation. Their identi- 
fication in this study may therefore point to 
previously unknown aspects of these processes. 
A few selected genes in this group are shown in 
Fig. 5H. The stanniocalcin gene, for example 
(Fig. 5H), encodes a secreted protein without a 
clearly identified function in human cells (14, 
IS). Its induction in serum-stimulated fibro- 
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Fig. 4. "Reprogramming" of fibroblasts. Expres- 
sion profiles of genes whose function is likely to 
play a role in the reprogramming phase of the 
response are shown with the same representa- 
tion as in Fig. 2. In the cases in which a gene 
was represented by more than one element in 
the micxoarray, all measurements are shown. 
The genes were grouped into categories on the 
basis of our knowledge of their most likely role. 
Some genes with pleiotropic roles were includ- 
ed in more than one category. 



blasts suggests the possibility that it may play a 
, role „ in! the,, .woupd-healing process, . perhaps;-. 
serving as a sigrial in mediating inflammation 
or angiogenesis. 

One of the most important results of this 
exploration was the discovery of over 200 pre- 
viously unknown genes whose expression was 
regulated in specific temporal patterns during 
the response of fibroblasts to serum. For exam- 
ple, 1 3 of the 40 genes in cluster D (Fig. 2) have 
descriptive names that reflect their putative 
function. Nine of these 1 3 genes (69%) encode 
proteins that play roles in cell cycle progres- 
sion, particularly in DNA replication and the 
G 2 -M transition. This enrichment for cell 
cycle-related genes suggests that some of the 
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: unnamed genes in this cluster— for. example, 
...EST. W79311 and ES£ Rl 3 146, neither; of 
which have sequence similarity to previously 
characterized genes — may represent previously 
unknown genes involved in this part of the cell 
cycle. Similarly, a remarkable fraction of genes 
that were grouped into cluster F on the basis of 
their expression profiles encoded proteins in- 
volved in intercellular signaling (Fig. 2), sug- 
gesting that a similar role should be considered 
for the many unnamed genes in this cluster. A 
disproportionately large fraction of the genes 
whose transcription diminished upon serum 
stimulation were unnamed ESTs. 

Our intention was to use this experiment as 
a model to study the control of the transition 
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Fig. 5. The transcriptional response to serum suggests a multi faceted role for fibroblasts in the 
physiology of wound healing. The features of the transcriptional program of fibroblasts in response 
to serum stimulation that appear to be related to various aspects of the wound-healing process and 
fibroblast proliferation are shown with the same convention for representing changes in transcript 
levels as was used in Figs. 2 and 4. (A) Cell cycle and proliferation, (B) coagulation and hemostasis, 
(C) inflammation, (D) angiogenesis, (E) tissue remodeling, (F) cytoskeletal reorganization, (C) 
reepithelialization, (H) unidentified role in wound healing, and (I) cholesterol biosynthesis. The 
numbers in (C) and (C) refer to genes whose products serve as signals to neutrophils (C1), 
monocytes and macrophages (C2), T lymphocytes (C3), B lymphocytes (C4), and melanocytes (G1). 
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from Q 0 to a proliferating state. However, one 
of the defining characteristics of genome-scale 
' expression profiling experiments is that tfre ex- 
amination of so many diverse genes opens a 
window on all the processes that actually occur 
and not merely the single process one intended 
to observe. Serum, the soluble fraction of clot- 
ted blood, is normally encountered by cells in 
vivo in the context of a wound. Indeed, the 
expression program that we observed in re- 
sponse to serum suggests that fibroblasts are 
programmed to interpret the abrupt exposure to 
serum not as a general mitogenic stimulus but 
as a specific physiological signal, signifying a 
wound. The proliferative response that we orig- 
inally intended to study appeared to be part of a 
larger physiological response of fibroblasts to a 
wound. Other features of the transcriptional 
response to serum suggest that the fibroblast is 
an active participant in a conversation among 
the diverse cells that work together in wound 
repair, interpreting, amplifying, modifying, and 
broadcasting signals controlling inflammation, 
angiogenesis, and epithelial regrowth during 
the response to an injury. 

We recognize that these in vitro results 
almost certainly represent a distorted and in- 
complete rendering of the normal physiolog- 
ical response of a fibroblast to a wound. 
Moreover, only the responses elicited directly 
by exposure of fibroblasts to serum were 
examined. The subsequent signals from other 
cellular participants in the normal wound- 
healing process would certainly provoke fur- 
ther evolution of the transcriptional program 
in fibroblasts at the site of a wound, which 
this experiment cannot reveal. Nevertheless, 
we believe that the picture that emerged 
strongly suggests a much larger and richer 
role for the fibroblast in the orchestration of 
this important physiological process than had 
previously been suspected. 
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We used cDNA microarrays to explore the variation in expression of approximately 8,000 unique genes among the 
60 cell lines used in the National Cancer Institute's screen for anti-cancer drugs. Classification of the cell lines based 
solely on the observed patterns of gene expression revealed a correspondence to the ostensible origins of the 
tumours from which the cell lines were derived. The consistent relationship between the gene expression patterns 
and the tissue of origin allowed us to recognize outliers whose previous classification appeared incorrect. Specific 
features of the gene expression patterns appeared to be related to physiological properties of the cell lines, such 
as their doubling time in culture, drug metabolism or the interferon response. Comparison of gene expression pat- 
terns in the cell lines to those observed in normal breast tissue or in breast tumour specimens revealed features of 
the expression patterns in the tumours that had recognizable counterparts in specific cell lines, reflecting the 
tumour, stromal and inflammatory components of the tumour tissue. These results provided a novel molecular 
characterization of this important group of human cell lines and their relationships to tumours in vivo. 



Introduction 

Cell lines derived from human tumours have been extensively used 
as experimental models of neoplastic disease. Although such cell 
lines differ from both normal and cancerous tissue, the inaccessi- 
bility of human tumours and normal tissue makes it likely that 
such cell lines will continue to be used as experimental models for 
the foreseeable future. The National Cancer Institute's Develop- 
mental Therapeutics Program (DTP) has carried out intensive 
studies of 60 cancer cell lines (the NCI60) derived from tumours 
from a variety of tissues and organs 1 ^ 1 . The DTP has assessed many 
molecular features of the cells related to cancer and chemothera- 
peuu'c sensitivity, and has measured the sensitivities of these 60 cell 
lines to more than 70,000 different chemical compounds, includ- 
ing all common chemotherapeutics (http://dtp.nci.nih.gov). A 
previous analysis of these data revealed a connection between the 
pattern of activity of a drug and its method of action. In particular, 
there was a tendency for groups of drugs with similar patterns of 
activity to have related methods of action 3 - 5-7 . 

We used DNA microarrays to survey the variation in abun- 
dance of approximately 8,000 distinct human transcripts in these 
60 cell lines. Because of the logical connection between the func- 
tion of a gene and its pattern of expression, the correlation of gene 
expression patterns with the variation in the phenotype of the cell 
can begin the process by which the function of a gene can be 
inferred. Similarly, the patterns of expression of known genes can 



reveal novel phenotypic aspects of the cells and tissues studied 8-10 . 
Here we present an analysis of the observed patterns of gene 
expression and their relationship to phenotypic properties of the 
60 cell lines. The accompanying report 1 1 explores the relationship 
between the gene expression patterns and the drug sensitivity pro- 
files measured by the DTP. The assessment of gene expression pat- 
terns in a multitude of cell and tissue types, such as the diverse set 
of cell lines we studied here, under diverse conditions in vitro and 
in vivo, should lead to increasingly detailed maps of the human 
gene expression program and provide clues as to the physiological 
roles of uncharacterized genes 11-16 . The databases, plus tools for 
analysis and visualization of the data, are available (http://genome- 
www.stanford.edu/nci60 and http://discover.nci.nih.gov). 

Results 

We studied gene expression in the 60 cell lines using DNA 
microarrays prepared by robotically spotting 9,703 human 
cDNAs on glass microscope slides 17,18 . The cDNAs included 
approximately 8,000 different genes: approximately 3,700 repre- 
sented previously characterized human proteins, an additional 
1,900 had homologues in other organisms and the remaining 
2,400 were identified only by ESTs. Due to ambiguity of the iden- 
tity of the cDNA clones used in these studies, we estimated that 
approximately 80% of the genes in these experiments were cor- 
rectly identified. The identities of approximately 3,000 cDNAs 
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Fig. 1 Gene expression patterns related to the tissue of origin of the cell lines. Two-dimen- 
sional hierarchical clustering was applied to expression data from a set of 1,161 cDNAs 
measured across 54 cell lines. The 1,161 cDNAs were those (of 9,703 total) with transcript 
levels that varied by at least sevenfold (log2 (ratio) >2.8) relative to the reference pool in at 
least 4 of 60 cell lines. This effectively selected genes with the greatest variation in expres- 
sion level across the 60 cell lines (including those genes not well represented in the refer- 
ence pool), and therefore highlighted those gene expression patterns that best 
distinguished the cell lines from one another. Data from 64 hybridizations were used, one 
for each cell line plus the two additional independent representations of each of the cell 
lines K562 and MCF7. The two cell lines represented in triplicate were correspondingly 
weighted for the gene clustering so that each of the 60 cell lines contributed equally to the 
clustering, a. The cell-line dendrogram, with the terminal branches coloured to reflect the 
ostensible tissue of origin of the cell line (red, leukaemia; green, colon; pink, breast; pur- 
ple, prostate; light blue, lung; orange, ovarian; yellow, renal; grey, CNS; brown, melanoma; 
black, unknown (NCI/ADR-RES)). The scale to the right of the dendrogram depicts the cor- 
relation coefficient represented by the length of the dendrogram branches connecting 
pairs of nodes. Note that the two triplets of replicated cell lines (K562 and M<37) cluster 
tightly together and were well differentiated from even the most closely related cell lines, 
indicating that this clustering of cell lines is based on characteristic variations in their gene 
expression patterns rather than artefacts of the experimental procedures, b, A coloured 
representation of the data table, with the rows (genes) and columns (cell lines) in cluster 
order. The dendrogram representing hierarchical relationships between genes was omit- 
ted for clarity, but is available (http//g enome-www.stanford.edu/nci60). The colour in each 
cell of this table reflects the mean-adjusted expression level of the gene (row) and cell line 
(column). The colour scale used to represent the expression ratios is shown. The labels 
*3a-3d" in (b) refer to the clusters of genes shown in detail in Fig. 3. 
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from these experiments have been sequence-verified, including 
all of those referred to here by name. 

Each hybridization compared Cy5-labelled cDNA reverse tran- 
scribed from mRNA isolated from one of the cell lines with Cy3- 
labelled cDNA reverse transcribed from a reference mRNA 
sample. This reference sample, used in all hybridizations, was 
prepared by combining an equal mixture of mRNA from 12 of 
the cell lines (chosen to maximize diversity in gene expression as 
determined primarily from two-dimensional gel studies 2 ). By 
comparing cDNA from each cell line with a common reference, 
variation in gene expression across the 60 cell lines could be 
inferred from the observed variation in the normalized Cy5/Cy3 
ratios across the hybridizations. 

To assess the contribution of artefactual sources of variation in 
the experimentally measured expression patterns, K562 and 
MCF7 cell lines were each grown in three independent cultures, 
and the entire process was carried out independently on mRNA 
extracted from each culture. The variance in the triplicate fluo- 
rescence ratio measurements approached a minimum when the 
fluorescence signal was greater than approximately 0.4% of the 
measurable total signal dynamic range above background in 
either channel of the hybridization. We selected the subset of 
spots for which significant signal was present in both the numer- 
ator and denominator of the ratios by this criterion to identify 
the best-measured spots. The pair-wise correlation coefficients 
for the triplicates of the set of genes that passed this quality con- 
trol level (6,992 spots included for the MCF7 samples and 6,161 
spots for K562) ranged from 0.83 to 0.92 (for graphs and details, 
see http://genome-www.stanford.edu/nci60). 

To make the orderly features in the data more apparent, we used 
a hierarchical clustering algorithm 19,20 and a pseudo-colour visu- 



alization matrix 3,21 . The object of the clustering was to group cell 
lines with similar repertoires of expressed genes and to group 
genes whose expression level varied among the 60 cell lines in a 
similar manner. Clustering was performed twice using different 
subsets of genes to assess the robustness of the analysis. In one case 
(Fig. 1), we concentrated on those genes that showed the most 
variation in expression among the 60 cell lines ( 1, 167 total). A sec- 
ond analysis (Fig. 2) included all spots that were thought to be well 
measured in the reference set (6,831 spots). 

Gene expression patterns related to the histologic 
origins of the cell lines 

The most notable property of the clustered data was that cell lines 
with common presumptive tissues of origin grouped together 
(Figs la and 2). Cell lines derived from leukaemia, melanoma, 
central nervous system, colon, renal and ovarian tissue were clus- 
tered into independent terminal branches specific to their respec- 
tive organ types with few exceptions. Cell lines derived from 
non-small lung carcinoma and breast tumours were distributed 
in multiple different terminal branches suggesting that their gene 
expression patterns were more heterogeneous. 

Many of these coherent cell line clusters were distinguished by 
the specific expression of characteristic groups of genes 
(Fig. 3a-d). For example, a cluster of approximately 90 genes was 
highly expressed in the melanoma-derived lines (Fig. 3c). This set 
was enriched for genes with known roles in melanocyte biology, 
including tyrosinase and dopachrome tautomerase (TYR and 
DCT; two subunits of an enzyme complex involved in melanin 
synthesis 22 ), MARTI (MLANA; which is being investigated as a 
target for immunotherapy of melanoma 23 ) and S100-f3 (S100B; 
which has been used as an antigenic marker in the diagnosis of 
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Fig. 2 Gene, expression patterns related to 
other cell-line phenotypesV a. We applied 
two-dimensional hierarchical clustering to 
expression data from a set of 6,831 cDNAs 
measured across the 64 cell lines. The 6,831 
cDNAs were those with a minimum fluores- 
cence signal intensity of approximately 0.4% 
of the dynamic range above background in 
the reference channel in each of the six 
hybridizations used to establish reproducibil- 
ity. This effectively selected those spots that 
provided the most reliable ratio measure- 
ments and therefore identified a subset of 
genes useful for exploring patterns comprised 
of those whose variation in expression across 
the 60 cell lines was of moderate magnitude. 
b. Cluster-ordered data table, c. Doubling 
time of cell lines. Cell lines are given in cluster 
order. Values are plotted relative to the mean. 
Doubling times greater than the mean are 
shown in green, those with doubling time less 
than the mean are shown in red. d, Three 
related gene clusters that were enriched for 
genes whose expression level variation was 
correlated with cell line proliferation rate. 
Each of the three gene clusters (clustered 
solely on the basis of their expression pat- 
terns) showed enrichment for sets of genes 
involved in distinct functional categories (for 
example, ribosomal genes versus genes 
involved in pre-RNA splicing), e. Gene cluster 
in which all characterized and sequence- veri- 
fied cDNAs encode genes known to be regu- 
lated by interferons, f. Gene cluster enriched 
for genes that have been implicated in drug 
metabolism (indicated by asterisks). A further 
property of the gene clustering evident here 
and in Fig. 2 is the strong tendency for redun- 
dant representations of the same gene to 
cluster immediately adjacent to one another, 
even within larger groups of genes with very 
similar expression patterns. In addition to 
illustrating the reproducibility and consis- 
tency of the measurements, and providing 
independent confirmation of many of our 
measurements, this property also demon- 
strates that these, and probably all, genes 
have nearly unique patterns of variation 
across the 60 cell lines. If this were not the 
case, and multiple genes had identical pat- 
terns of variation, we would not expect to be 
able to distinguish, by clustering on the basis 
of expression variation, duplicate copies of 
individual genes from the other genes with 
identical expression patterns. 
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. melanoma). LOXIMV1, the Seventh line designated as melanoma 
in the NCI60, did; not show this characteristic pattern/Aithoiigh 
isolated from a patient with melanoma, LOXIMV1 has previously 
been noted to lack melanin and other markers useful for identifi- 
cation of melanoma cells 1 . 

Paradoxically, two related cell lines (MDA-MB435 and MDA- 
N), which were derived from a single patient with breast cancer 
and have been conventionally regarded as breast cancer cell lines, 
shared expression of the genes associated with melanoma. MDA- 
MB435 was isolated from a pleural effusion in a patient with 
metastatic ductal adenocarcinoma of the breast 24,25 . It remains 
possible that the origin of the cell line was a breast cancer, and that 
its gene expression pattern is related to the neuroendocrine fea- 
tures of some breast cancers 26 . But our results suggest that this cell 
line may have originated from a melanoma, raising the possibility 
that the patient had a co-existing occult melanoma. 

The higher-level organization of the cell-line tree — in which 
groups span cell lines from different tissue types — also reflected 
shared biological properties of the tissues from which the cell 
lines were derived. The carcinoma-derived cell lines were divided 
into major branches that separated those that expressed genes 
characteristic of epithelial cells from those that expressed genes 
more typical of stromal cells. A cluster of genes is shown (Fig. 3b) 
that is most strongly expressed in cell lines derived from colon 
carcinomas, six of seven ovarian-derived cell lines and the two 
breast cancer lines positive for the oestrogen receptor. The named 
genes in this cluster have been implicated in several aspects of 
epithelial cell biology 27 . The cluster was enriched for genes whose 
products are known to localize to the basolateral membrane of 
epithelial cells, including those encoding components of 
adherens complexes (for example, desmoplakin (DSP), 
periplakin (PPL) and plakoglobin (JUP)), an epithelial- 
expressed cell-cell adhesion molecule (M4S1) and a sodium/ 
hydrogen ion exchanger 28 " 31 (SLC9A1). It also contained genes 
that encode putative transcriptional regulators of epithelial mor- 
phogenesis, a human homologue of a Drosophila melanogaster 
epithelial-expressed tumour suppressor (LLGL1) and a homeo- 
box gene thought to control calcium-mediated adherence in 
epithelial cells 32 - 33 (MSX2). 

In contrast, a separate, major branch of the cell-line dendro- 
gram (Fig. lfl) included all glioblastoma-derived cell lines, all 
renal-cell-carcinoma-derived cell lines and the remaining carci- 
noma-derived lines. The characteristic set of genes expressed in 
this cluster included many whose products are involved in stro- 
mal cell functions (Fig. 3d). Indeed, the two cell lines originally 
described as 'sarcoma-like' in appearance (Hs578T, breast carci- 
nosarcoma, and SF539, gliosarcoma) expressed most of these 
genes 34 - 35 . Although no single gene was uniformly characteristic 
of this cluster, each cell line showed a distinctive pattern of 
expression of genes encoding proteins with roles in synthesis or 
modification of the extracellular matrix (for example, caldesmon 
(CALD1), cathepsins, thrombospondin (THBS), lysyl oxidase 
(LOX) and collagen subtypes). Although the ovarian and most 
non-small-cell-lung-derived carcinomas expressed genes charac- 
teristic of both epithelial cells and stromal cells, they probably 
clustered with the CNS and renal cell carcinomas in this analysis 
because genes characteristically expressed in stromal cells were 
more abundantly represented in this gene set. 

Physiological variation reflected 
in gene expression patterns 

A cluster diagram of 6,831 genes (Fig. 2) is useful for exploring 
clusters of genes whose variation in mRNA levels was not obvi- 
ously attributable to cell or tissue type. We identified some gene 
clusters that were enriched for genes involved in specific cellular 
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processes; theJvariatiori in their expression levels may reflect cor- 
'.'respphding' differerices in Activity of these processes in the cell 
lines. For example, a cluster of 1,159 genes (Fig. 2a) included 
many whose products are necessary for progression through the 
cell cycle (such as CCNA1, MCM106 and MAD2L1), RNA pro- 
cessing and translation machinery (such as RNA helicases, 
hnRNPs and translation elongation factors) and traditional 
pathologic markers used to identify proliferating cells (MKJ67). 
Within this large cluster were smaller clusters enriched for genes 
with more specialized roles. One cluster was highly enriched for 
numerous ribosomal genes, whereas another was more enriched 
for genes encoding RNA-splicing factors. The variation in 
expression of these ribosomal genes was significantly correlated 
with variation in the cell doubling time (correlation coefficient of 
0.54), supporting the notion that the genes in this cluster were 
regulated in relation to cell proliferation rate or growth rate in 
these cell lines. 

In a smaller gene cluster (Fig. 2d), all of the named genes were 
previously known to be regulated by interferons 13 ' 36 . Additional 
groups of interferon-regulated genes showed distinct patterns of 
expression (data not shown), suggesting that the NCI60 cell lines 
exhibited variation in activity of interferon-response pathways, 
which was reflected in gene expression patterns 36 . 

Another cluster (Fig. 2e). contained several genes encoding 
proteins with possible interrelated roles in drug metabolism, 
including glutamate-cysteine ligase (GLCLC, the enzyme respon- 
sible for the rate limiting step of glutathione synthesis), thiore- 
doxin (TXN) and thioredoxin reductase (TXNRD1; enzymes 
involved in regulating redox state in cells), and MRP1 (a drug 
transporter known to efficiently transport glutathione-conju- 
gated compounds 37 ). The elevated expression of this set of genes 
in a subset of these cell lines may reflect selection for resistance to 
chemotherapeutics. 

Cell lines facilitate interpretation of gene expression 
patterns in complex clinical samples 

Like many other types of cancer, tumours of the breast typically 
have a complex histological organization, with connective tissue 
and leukocytic infiltrates interwoven with tumour cells. To 
explore the possibility that variation in gene expression in the 
tumour cell lines might provide a framework for interpreting the 
expression patterns in tumour specimens, we compared RNA 
isolated from two breast cancer biopsy samples, a sample of nor- 
mal breast tissue and the NCI60 cell lines derived from breast 
cancers (excluding MDA-MB-435 and MDA-N) and leukaemias 
(Fig. 4). This clustering highlighted features of the gene expres- 
sion pattern shared between the cancer specimens and individual 
cell lines derived from breast cancers and leukaemias. 

The genes encoding keratin 8 (KRT8) and keratin 19 (KRT19), 
as well as most of the other 'epithelial' genes defined in the com- 
plete NCI60 cell line cluster, were expressed in both of the biopsy 
samples and the two breast-derived cell lines, MCF-7 and T47D, 
expressing the oestrogen receptor, suggesting that these tran- 
scripts originated in tumour cells with features similar to those of 
luminal epithelial cells (Fig. 5a). Expression of a set of genes char- 
acteristic of stromal cells, including collagen genes (COL3A1, 
COL5A1 and COL6A1) and smooth muscle cell markers 
(TAGLN), was a feature shared by the tumour sample and the 
stromal-like cell lines Hs578T and BT549 (Fig. 5fc). This feature 
of the expression pattern seen in the tumour samples is likely to 
be due to the stromal component of the tumour. The tumours 
also shared expression of a set of genes (Fig. 5c) with the multiple 
myeloma cell line (RPMI-8226), notably including 
immunoglobulin genes, consistent with the presence of B cells 
in the tumour (this was confirmed by staining with anti- 
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melanoma cluster (16 ESTs) 



mesenchymal cluster (67 ESTs) 



Fig. 3 Gene clusters related to tissue characteristics in the cell lines. Enlargements of the regions of the cluster diagram in Fig. 1 showing gene clusters enriched 
for genes expressed in cell lines of ostensibly similar origins, a. Cluster of genes highly expressed tn the leukaemia-derived cell lines. Two sub-clusters distinguish 
genes that were expressed in most leukaemia-derived tines from those expressed exclusively in the eryroblastoid line, K562 (note that the triplicate hybridiza- 
tions cluster together), b. Cluster of genes highly expressed in all colon (7/7) cell lines and all breast-derived cell lines positive for the oestrogen receptor (2/2). This 
set of genes was also moderately expressed in most ovarian lines (5/6) and some non-small-cell-lung (4/6) lines, but was expressed at a lower level in all renal-can- 
cer-derived lines, c Cluster of genes highly expressed in most melanoma-derived lines (6/7) and two related lines ostensibly derived from breast cancer (MDA- 
MB435 and MDA-N). d, Cluster of genes highly expressed in all glioblastoma (676) tines and most lines derived from renal-cell carcinoma (7/8), and more 
moderately expressed in a subset of carcinoma-derived tines. In alt panels, names are shown only for all known genes whose identities were independently re- 
verified by sequencing. The number of sequence-validated ESTs within the cluster is indicated below the cluster in parentheses. The position of gene names in the 
adjacent list only approximates their position in the cluster diagram as indicated by the lines connecting the colour chart with the gene list. Complete cluster 
images with all gene names and accession numbers are available (http7/geno me-www.stanford.edu/nci 60). 
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cell lines 



and tissues 



immunoglobulin antibodies; data not shown). Therefore, dis- 
tinct sets of genes with co- varying expression among the samples 
(Fig. 4, arrow) appear to represent distinct cell types that can be 
distinguished in breast cancer tissue. A fourth cluster of genes, 
more highly expressed in all of the cell lines than in any of the 
clinical specimens, was enriched for genes present in the 'prolif- 
eration' cluster described above (Fig. 5d). The variation in 
expression of these genes likely paralleled the difference in prolif- 
eration rate between the rapidly cycling cultured cell lines and the 
much more slowly dividing cells in tissues. 

Discussion 

Newly available genomics tools allowed us to explore variation in 
gene expression on a genomic scale in 60 cell lines derived from 
diverse tumour tissues. We used a simple cluster analysis to iden- 
tify the prominent features in the gene expression patterns that 
appeared to reflect 'molecular signatures' of the tissue from 
which the cells originated. The histological characteristics of the 
cell lines that dominated the clustering were pervasive enough 
that similar relationships were revealed when alternative subsets 
of genes were selected for analysis. Additional features of the 
expression pattern may be related to variation in physiological 
attributes such as proliferation rate and activity of interferon- 
response pathways. 

The properties of the tumour-derived cell lines in this study 
have presumably all been shaped by selection for resistance to 
host defences and chemotherapeutics and for rapid proliferation 
in the tissue culture environment of synthetic growth media, fetal 
bovine serum and a polystyrene substratum. But the primary 
identifiable factor accounting for variation in gene expression 
patterns among these 60 cell lines was the identity of the tissue 
from which each cell line was ostensibly derived. For most of the 
cell lines we examined, neither physiological nor experimental 
adaptation for growth in culture was sufficient to overwrite the 
gene expression programs established during differentiation in 
vivo. Nevertheless, the prominence of mesenchymal features in 
the cell lines isolated from glioblastomas and carcinomas may 
reflect a selection for the relative ease of establishment of cell 
lines expressing stromal characteristics, perhaps combined with 
physiological adaptation to tissue culture conditions 38 ^ 0 . 



: Fig.-4 Comparison of the' geherexpre.ssion patterns in clinical breast cancer 
^specimens.and cultured, breast. "cancer and leukaemia cell lines!, a, Two-dimerv: 
sional" hierarchical clustering applied to gene expression data .for two breast 
cancer specimens, a lymph node metastasis from one patient, normal breast 
and the NCI60 breast and leukaemia-derived cell lines. The gene expression 
data from tissue specimens was clustered along with expression data from a 
subset of the NCI60 cell lines to explore whether features of expression pat- 
terns observed in specific lines could be identified in the tissue samples. Labels 
indicate gene clusters (shown in detail In Fig. 5) that may be related to specific 
cellular components of the tumour specimens, b, Breast cancer specimen 16 
stained with antl-keratin antibodies, showing the complex mix of cell types 
characteristically found in breast tumours. The arrows highlight the different 
cellular components of this tissue specimen that were distinguished by the 
gene expression cluster analysis (Fig. 5). 



Biological themes linking genes with related expression pat- 
terns may be inferred in many cases from the shared attributes of 
known genes within the clusters. Uncharacterized cDNAs are 
likely to encode proteins that have roles similar to those of the 
known gene products with which they appear to be co- regulated. 
Still, for several clusters of genes, we were unable to discern a com- 
mon theme linking the identified members of the cluster. Further 
exploration of their variation in expression under more diverse 
conditions and more comprehensive investigation of the physiol- 
ogy of the NCI60 cells may provide insight 10 . The relationship of 
the gene expression patterns to the drug sensitivity patterns mea- 
sured by the DTP is an example of linking variation in gene 
expression with more subtle and diverse phenotypic variation 1 '. 

The patterns of gene expression measured in the NCI60 cell 
lines provide a framework that helps to distinguish the cells that 
express specific sets of genes in the histologically complex breast 
cancer specimens'". Although it is now feasible to analyse gene 
expression in micro-dissected tumour specimens 42,43 , this obser- 
vation suggests that it will be possible to explore and interpret 
some of the biology of clinical tumour samples by sampling them 
intact. As is useful in conventional morphological pathology, one 
might be able to observe interactions between a tumour and its 
microenvironment in this way. These relationships will be clari- 
fied by suitable analysis of gene expression patterns from intact as 
well as dissected tumours 12 ' 14,15,41 . 

Methods 

cDNA clones. We obtained the 9,703 human cDNA clones (Research Genet- 
ics) used in these experiments as bacterial colonies in 96-well microtitre 
plates 9 . Approximately 8,000 distinct Unigene clusters (representing nomi- 
nally unique genes) were represented in this set of clones. All genes identi- 
fied here by name represent clones whose identities were confirmed by re- 
sequencing, or by the criteria that two or more independent cDNA clones 
ostensibly representing the same gene had nearly identical gene expression 
patterns. A single-pass 3' sequence re-verification was attempted for every 
clone after re-streaking for single colonies. For a subset of genes for which 
quality 3' sequence was not obtained, we attempted to confirm identities by 
5' sequencing. Of the subset of clones selected for 5' sequence verification 
on the basis of an interesting pattern of expression (888 total), 331 were cor- 
rectly identified, 57, incorrectly identified, and 500, indeterminate (poor 
quality sequence). We estimated that 1 5%-20% of array elements contained 
DNA representing more than one clone per well. So far, the identities of 
-3,000 clones have been verified. The full list of clones used and their nomi- 
nal identities are available (gene names preceded by the designation "SID#" 
(Stanford Identification) represent clones whose identities have not yet been 
verified; http://genome-www.stanford.edu:8000/nci60). 

Production of cDNA microarrays. The arrays used in this experiment were 
produced at Synteni Inc. (now Incyte Pharmaceuticals). Each insert was 
amplified from a bacterial colony by sampling 1 u.1 of bacterial media and 
performing PCR amplification of the insert using consensus primers for 
the three plasmids represented in the clone set (5'-TTGTAAAACGACG 
GCCAGTG— 3', 5'-CACACAGGAAACAGCTATG-3'). Each PCR product 
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. (100 nl) wis purified by gel exclusion,; concentrated ;;and:resuspended in 
3xSSC (10 ul). The. PGR. products: were then.printed on. treated, glass 
•microscope slides using a robot with four printing tips': Detailed protocols 
for assembling and operating a microarray printer, and printing and exper- 
imental application of DNA microarrays are available (http://cmgm. 
stanford.edu/pbrown). 

Preparation of mRNA and reference pool. Cell lines were grown from NCI 
DTP frozen stocks in RPM1-1640 supplemented with phenol red, glutamine 
(2 mM) and 5% fetal calf serum. To minimize the contribution of variations 
in culture conditions or cell density to differential gene expression, we grew 
each cell line to 80% confluence and isolated mRNA 24 h after transfer to 
fresh medium. The time between removal from the incubator and lysis of the 
cells in RNA stabilization buffer was minimized (< 1 min). Cells were lysed in 
buffer containing guanidium isothiocyanate and total RNA was purified 
with the RNeasy purification kit (Qiagen). We purified mRNA as needed 



using a,poly(A);purification kit (Oligotex, Qiagen) according to the manu- 
facturer's instructions; denaturing agarose gel electrophoresis assessed the 
integrity and relative contamination of mRNA with ribosomal RNA. 

The breast tumours were surgically excised from patients and rapidly 
transported to the pathology laboratory, where samples for microarray 
analysis were quickly frozen in liquid nitrogen and stored at -80 °C until 
use. A frozen tumour specimen was removed from the freezer, cut into 
small pieces (-50-100 mgeach), immediately placed into 10-12 ml ofTri- 
zol reagent (Gibco-BRL) and homogenized using a PowerGen 125 Tissue 
Homogenizer (Fisher Scientific), starting at 5,000 r.p.m. and gradually 
increasing to -20,000 r.p.m. over a period of 30-60 s. We processed the Tri- 
zol/tumour homogenate as described in the Trizol protocol, including an 
initial step to remove fat. Once total RNA was obtained, we isolated mRNA 
with a FastTrack 2.0 kit (Invitrogen) using the manufacturer's protocol for 
isolating mRNA starting from total RNA. The normal breast samples were 
obtained from Clontech. 
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Fig. 5 Histologic features of breast cancer biopsies can be recognized and parsed based on gene expression patterns. Enlargements of the regions of the cluster 
diagram in Fig. 4 showing gene clusters enriched for genes expressed in different cell types in the breast cancer specimens, as distinguished by clustering with the 
cultured cell lines, a, A cluster including many genes characteristic of epithelial cells expressed in cell lines (T47D and MCF7) derived from breast cancer positive for 
the oestrogen receptor and tumours, b. Genes expressed in cell lines derived from breast cancer with stromal cell characteristics (Hs578T and BT549) and tumour 
specimens. Expression of these genes in the tumour samples may reflect the presence of myofibroblasts in the cancer specimen stroma, c. Genes expressed in leuko- 
cyte-derived cell lines, showing common leukocyte, and separate 'myeloid' and 'B-cell', gene clusters, d. Genes that were relatively highly expressed in all cell lines 
compared with the tumour specimens and normal breast. The higher expression of this set of genes involved in cell cycle transit in the cell lines is likely to reflect the 
higher proliferative rate of cells cultured in the presence of serum compared with the average proliferation rate of cells in the biopsied tissue. 
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We combined mRNA from the following cells- in equal quantities to 
make the reference pool: HL-60 (acute myeloid leukaemia) and K562 
(chronic myeloid leukaemia); NCI-H226 (nohrsmall-cell-lung); COLO 
205 (colon)r SNB-19 (central nervous system); L0X-1MVI (melanoma); 
OVCAR-3 and OVCAR-4 (ovarian); CAKI-1 (renal); PC-3 (prostate); and 
MCF7 and Hs578T (breast). The criterion for selection of the cell lines in 
the reference are described in detail in the accompanying manuscript 12 . 

Doubling-time calculations. We calculated doubling times based on rou- 
tine NC160 cell line compound screening data; and they reflect the dou- 
bling times for cells inoculated into 96-well plates at the screening inocula- 
tion densities and grown in RPMI 1640 medium supplemented with 5% 
fetal bovine serum for 48 h. We measured cell populations using sulforho- 
damine B optical density measurement assay. The doubling time constant k 
was calculated using the equation: N/No = e 1 ", where No is optical density 
for control (untreated) cells at time zero, N is optical density for control cells 
after 48-h incubation, and t is 48 h. The same equation was then used with the 
derived k to calculate the doubling time t by setting N/No = 2. For a given cell 
line, we obtained No and N values by averaging optical densities (N>6,000) 
obtained for each cell line for a year's screening. Data and experimental details 
are available (http://dtp.nci.nih.gov). 

Preparation and hybridization of fluorescent labelled cDNA. For each 
comparative array hybridization, labelled cDNA was synthesized by reverse 
transcription from test cell mRNA in the presence of Cy5-dUTP, and from 
the reference mRNA with Cy3-dUTP, using the Superscript II reverse-tran- 
scription kit (Gibco-BRL). For each reverse transcription reaction, mRNA 
(2 ug) was mixed with an anchored oligo-dT (d-20T-d(AGQ) primer (4 
ug) in a total volume of 1 5 ul, heated to 70 °C for 1 0 min and cooled on ice. 
To this sample, we added an unlabelled nucleotide pool (0.6 ul; 25 mM 
each dATP, dCTP, dGTP, and 15 mM dTTP), either Cy3 or Cy5 conjugated 
dUTP (3 ul; I mM; Amersham), 5xfirst-strand buffer (6 ul; 250 mM Tris- 
HCL, pH 8.3, 375 mM KC1, 15 mM MgCl 2 ), 0.1 M DTT (3 ul) and 2 ul of 
Superscript 11 reverse transcriptase (200 u/ul). After a 2-h incubation at 42 
°C, the RNA was degraded by adding 1 N NaOH ( 1 .5 ul) and incubating at 
70 °C for 1 0 min. The mixture was neutralized by adding of 1 N HCL ( 1 .5 
ul), and the volume brought to 500 ul with TE ( 10 mM Tris, 1 mM EDTA). 
We added Cotl human DNA (20 ug; Gibco-BRL), and purified the probe 
by centrifugation in a Centricon-30 micro-concentrator (Amicon). The 
two separate probes were combined, brought to a volume of 500 ul, and 
concentrated again to a volume of less than 7 ul. We added 10 ug/ul 
poly(A) RNA (1 ul; Sigma) and tRNA (10 ug/ul; Gibco-BRL) were added, 
and adjusted the volume to 9.5 ul with distilled water. For final probe 
preparation, 20XSSC (2.1 ul; 1.5 M NaCI, 150 mM NaCitrate, pH 8.0) and 
10% SDS (0.35 ul) were added to a total final volume of 1 2 ul. The probes 
were denatured by heating for 2 min at 100 °C, incubated at 37 °C for 
20-30 min, and placed on the array under a 22 mmx22 mm glass coverslip. 
We incubated slides overnight at65°C for 14-18 h in a custom slide cham- 
ber with humidity maintained by a small reservoir of 3xSSC. Arrays were 
washed by submersion and agitation for 2-5 min in 2xSSC with 0. 1 % SDS, 
followed by lxSSC and then O.lxSSC. The arrays were "spun dry" by cen- 
trifugation for 2 min in a slide-rack in a Beckman GS-6 tabletop centrifuge 
in Microplus carriers at 650 r.p.m. for 2 min. 

Array quantitation and data processing. Following hybridization, arrays 
were scanned using a laser-scanning microscope (ref. 17; http://cmgm. 
stanford.edu/pbrown). Separate images were acquired for Cy3 and Cy5. We 
carried out data reduction with the program ScanAlyze (M.B.E., available 



at http://ranistanfprd;edu/software). Each spot was defined by manual 
.'.positioning /of a grid 'of circles over the array image. For each fluorescent 
image, theaverage pixel intensity within each circle was determined; and a 
local background was computed for each spot equal to the median pixel 
intensity in a square of 40 pixels in width and height centred on the spot 
centre, excluding all pixels within any defined spots. Net signal was deter- 
mined by subtraction of this local background from the average intensity 
for each spot. Spots deemed unsuitable for accurate quantitation because 
of array artefacts were manually flagged and excluded from further analy- 
sis. Data files generated by ScanAlyze were entered into a custom database 
that maintains web-accessible files. Signal intensities between the two fluo- 
rescent images were normalized by applying a uniform scale factor to all 
intensities measured for the Cy5 channel. The normalization factor was 
chosen so that the mean log(Cy3/Cy5) for a subset of spots that achieved a 
minimum quality parameter (approximately 6,000 spots) was 0. This effec- 
tively defined the signal-intensity-weighted 'average' spot on each array to 
have a Cy3/Cy5 ratio of 1 .0. 

Cluster analysis. We extracted tables (rows of genes, columns of individual 
microarray hybridizations) of normalized fluorescence ratios from the data- 
base. Various selection criteria, discussed in relation to each data set, were 
applied to select subsets of genes from the 9,703 cDNA elements on the 
arrays. Before clustering and display, the logarithm of the measured fluores- 
cence ratios for each gene were centred by subtracting the arithmetic mean of 
all ratios measured for that gene. The centring makes all subsequent analyses 
independent of the amount of each gene's mRNA in the reference pool. 

We applied a hierarchical clustering algorithm separately to the cell lines 
and genes using the Pearson correlation coefficient as the measure of simi- 
larity and average linkage clustering 3,1 9-2 '. The results of this process are 
two dendrograms (trees), one for the cell lines and one for the genes, in 
which very similar elements are connected by short branches, and longer 
branches join elements with diminishing degrees of similarity. For visual 
display the rows and columns in the initial data table were reordered to 
conform to the structures of the dendrograms obtained from the cluster 
analysis. Each cell in the cluster-ordered data table was replaced by a graded 
colour (pure red through black to pure green), representing the mean- 
adjusted ratio value in the cell. Gene labels in cluster diagrams are dis- 
played here only for genes that were represented in the microarray by 
sequence-verified cDNAs. A complete software implementation of this 
process is available (http://rana.stanford.edu/software), as well as all clus- 
tering results (http://genome-www.stanford.edu/nci60). 
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